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Introduction 



During the past few years, several of us hâve been asked many times about référ- 
ences on finite tree automata. On one hand, this is the witness of the liveness of 
this field. On the other hand, it was difficult to answer. Besides several excellent 
survey chapters on more spécifie topics, there is only one monograph devoted 
to tree automata by Gécseg and Steinby. Unfortunately, it is now impossible 
to find a copy of it and a lot of work has been done on tree automata since 
the publication of this book. Actually using tree automata has proved to be a 
powerful approach to simplify and extend previously known results, and also to 
find new results. For instance récent works use tree automata for application 
in abstract interprétation using set constraints, rewriting, automated theorem 
proving and program vérification, databases and XML schéma languages. 

Tree automata hâve been designed a long time ago in the context of circuit 
vérification. Many famous researchers contributed to this school which was 
headed by A. Church in the late 50's and the early 60's: B. Trakhtenbrot, 
J.R. Biichi, M.O. Rabin, Doner, Thatcher, etc. Many new ideas came out of 
this program. For instance the connections between automata and logic. Tree 
automata also appeared first in this framework, following the work of Doner, 
Thatcher and Wright. In the 70 's many new results were established concerning 
tree automata, which lose a bit their connections with the applications and were 
studied for their own. In particular, a problem was the very high complexity 
of décision procédures for the monadic second order logic. Applications of tree 
automata to program vérification revived in the 80 's, after the relative failure 
of automated déduction in this field. It is possible to verify temporal logic 
formulas (which are particular Monadic Second Order Formulas) on simplcr 
(small) programs. Automata, and in particular tree automata, also appeared 
as an approximation of programs on which fully automated tools can be used. 
New results were obtained Connecting properties of programs or type Systems 
or rewrite Systems with automata. 

Our goal is to fill in the existing gap and to provide a textbook which présents 
the basics of tree automata and several variants of tree automata which hâve 
been devised for applications in the aforementioned domains. We shall discuss 
only finite tree automata, and the reader interested in infinité trees should con- 
sult any récent survey on automata on infinité objects and their applications 
(See the bibliography) . The second main restriction that we hâve is to focus on 
the operational aspects of tree automata. This book should appeal the reader 
who wants to hâve a simple présentation of the basics of tree automata, and 
to see how some variations on the idea of tree automata hâve provided a nice 
tool for solving difficult problems. Therefore, specialists of the domain probably 
know almost ail the material embedded. However, we think that this book can 
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be helpful for many researchers who need some knowledge on tree automata. 
This is typically the case of a PhD student who may find new ideas and guess 
connections with his (her) own work. 

Again, we recall that there is no présentation nor discussion of tree automata 
for infinité trees. This domain is also in full development mainly due to appli- 
cations in program vérification and several surveys on this topic do exist. We 
hâve tried to présent a tool and the algorithms devised for this tool. Therefore, 
most of the proofs that we give are constructive and we hâve tried to give as 
many complexity results as possible. We don't claim to présent an exhaustive 
description of ail possible finite tree automata already presented in the literature 
and we did some choices in the existing ménagerie of tree automata. Although 
some works are not described thoroughly (but they are usually described in ex- 
ercises), we think that the content of this book gives a good flavor of what can 
be done with the simple ideas supporting tree automata. 

This book is an open work and we want it to be as interactive as possible. 
Readers and specialists are invited to provide suggestions and improvements. 
Submissions of contributions to new chapters and improvements of existing ones 
are welcome. 

Among some of our choices, let us mention that we hâve not defined any 
précise language for describing algorithms which are given in some pseudo algo- 
rithmic language. Also, there is no citation in the text, but each chapter ends 
with a section devoted to bibliographical notes where crédits are made to the 
relevant authors. Exercises are also presented at the end of each chapter. 

Tree Automata Techniques and Applications is composed of seven main 
chapters (numbered 1- 7). The first one présents tree automata and defines 
recognizable tree languages. The reader will find the classical algorithms and 
the classical closure properties of the class of recognizable tree languages. Com- 
plexity results are given when they are available. The second chapter gives 
an alternative présentation of recognizable tree languages which may be more 
relevant in some situations. This includes regular tree grammars, regular tree 
expressions and regular équations. The description of properties relating reg- 
ular tree languages and context-free word languages form the last part of this 
chapter. In Chapter 3, we show the deep connections between logic and au- 
tomata. In particular, we prove in full détails the correspondence between finite 
tree automata and the weak monadic second order logic with k successors. We 
also sketch several applications in various domains. 

Chapter 4 présents a basic variation of automata, more precisely automata 
with equality constraints. An equality constraint restricts the application of 
rules to trees where some subtrees are equal (with respect to some equality 
relation). Therefore we can discriminate more easily between trees that we 
want to accept and trees that we must reject. Several kinds of constraints are 
described, both originating from the problem of non-linearity in trees (the same 
variable may occur at différent positions). 

In Chapter 5 we consider automata which recognize sets of sets of ternis. 
Such automata appeared in the context of set constraints which themselves are 
used in program analysis. The idea is to consider, for each variable or each 
predicate symbol occurring in a program, the set of its possible values. The 
program gives constraints that thèse sets must satisfy. Solving the constraints 
gives an upper approximation of the values that a given variable can take. Such 
an approximation can be used to detect errors at compile time: it acts exactly as 
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a typing System which would be inferred from the program. Tree set automata 
(as we call them) recognize the sets of solutions of such constraints (hence sets 
of sets of trees). In this chapter we study the properties of tree set automata 
and their relationship with program analysis. 

Originally, automata were invented as an intermediate between function de- 
scription and their implementation by a circuit. The main related problem in 
the sixties was the synthesis problem: which arithmetic recursive functions can 
be achieved by a circuit? So far, we only considered tree automata which accepts 
sets of trees or sets of tuples of trees (Chapter 3) or sets of sets of trees (Chap- 
ter 5). However, tree automata can also be used as a computational device. 
This is the subject of Chapter 6 where we study tree transducers. 
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Préliminaires 



Terms 

We dénote by N the set of positive integers. We dénote the set of finite strings 
over N by N*. The empty string is denoted by e. 

A ranked alphabet is a couple (J 7 , Arity) where T is a finite set and Arity is 
a mapping froni T into N. The arity of a symbol / € T is Arity(f). The set of 
symbols of arity p is denoted by .Fp. Eléments of arity 0, 1 , . . . p are respectively 
called constants, unary, . . . , p-ary symbols. We assume that T contains at least 
one constant. In the examples, we use parenthesis and commas for a short 
déclaration of symbols with arity. For instance, /(, ) is a short déclaration for a 
binary symbol /. 

Let X be a set of constants called variables. We assume that the sets X 
and Tq are disjoint. The set T(T, X) of terms over the ranked alphabet T and 
the set of variables X is the smallest set defined by: 

- Tq Ç T(T, X) and 

- X Ç TÇF, X) and 

- if p> 1, /e.Fp andti,..., ip e TÇF,X), then f(t ly . . . ,t p ) G T{F,X). 
If X = then T(T,X) is also written TÇF). Terms in T(T) are called 

ground terms. A term £ in T(.F, A") is linear if each variable occurs at most 
once in t. 

Example 1. Let T = {cons(, ), nil,a} and <Y = {x,y}. Hère cons is a 
binary symbol, nil and a are constants. The term cons(x,y) is linear; the 
term cons(a;, cons(:r, nil)) is non linear; the term cons(a,cons(a, nil)) is a ground 
term. Terms can be represented in a graphical way. For instance, the term 
cons(a, cons(a, nil)) is represented by: 

cons 

A 

cons 

A 

a nil 



Terras and Trees 

A finite ordered tree t over a set of labels E is a mapping from a prefix-closed 
set Vos(t) Ç N* into E. Thus, a term t G T(JF,X) may be viewed as a finite 
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ordered ranked tree, the leaves of which are labeled with variables or constant 
symbols and the internai nodes are labeled with symbols of positive arity, with 
out-degree equal to the arity of the label, i.e.a term t G T(!F,X) can also be 
defined as a partial function t : N* — » T \J X with domain Vos(t) satisfying the 
following properties: 

(i) Vos^b) is nonempty and prefix-closed. 

(ii) Mp G Vos(t), if t(p) £T ni n> 1, then {j \ pj G Vosit)} = {!,.. .,n}. 
(iii) Vp G Vos(t), if t(p) G X U F Q , then {j | pj G Vos(t)} = 0. 

We confuse terms and trees, that is we only consider finite ordered ranked trees 
satisfying (i), (ii) and (iii). The reader should note that finite ordered trees with 
bounded rank k - i. e.there is a bound k on the out-degrees of internai nodes - 
can be encoded in finite ordered ranked trees: a label e G E is associated with 
k symbols (e, 1) of arity 1, . . . , (e, k) of arity k. 

Each élément in Vos{t) is called a position. A frontier position is a 
position p such that \/j G N, pj Vos{i). The set of frontier positions is 
denoted by TVos{€). Each position p in t such that t(p) G X is called a variable 
position. The set of variable positions of p is denoted by Wos(t). We dénote 
by 7iead(t) the root symbol of t which is defined by 7iead(t) = t(e). 

SubTerms 

A subterm t\ p of a term t G T(.F, A') at position p is defined by the following: 

- Vos(t\ p ) = {j | pj G Vos(t)}, 

- VqeVos(t\ p ),t\ p (q)=t(pq). 

We dénote by t[u] p the term obtained by replacing in t the subterm t\ p by 
a. 

We dénote by > the subterm ordering, i.e.we write t>t' iît' is a subterm 
of i. We dénote t > t' if t > t' and t ^ t' . 

A set of terms F is said to be closed if it is closed under the subterm 
ordering, i.e.Vi G F (t > t' =» i' G F). 

Functions on Terms 

The size of a term i, denoted by \\t\\ and the height of i, denoted by Tieight{t) 
are inductively defined by: 

- Height(t) = 0, ||t|| = if t G A", 

- Height(t) = 1, ||t|| = 1 if t G Jb, 

- 7fefc/ii(t) = l+maxdWe^i,) | i G {l,...,n}}), ||t|| = l+£i G {i,...,n} ll^ll 
if Head(t) G JF n . 



Example 2. Let J 7 = {f(,,),g(,),h(),a,b} and A" = {a;, y}. Consider the 
terms 
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A I A A 

a b b x y x y 

The root symbol of t is /; the set of frontier positions of t is {11, 12, 2, 31}; the 
set of variable positions of t' is {11, 12, 31, 32}; t\% = h(b); t[a]3 = /(g(a, b), a, a); 
Height(t) = 3; Height{t') = 2; ||t|| = 7; ||i'|| = 4. 



Substitutions 

A substitution (respectively a ground substitution) a is a mapping from X 
into T(!F, X) (respectively into T(T)) where there are only fmitely many vari- 
ables not mapped to themselves. The domain of a substitution a is the subset 
of variables x G X such that a(x) ^ x. The substitution {xi<— t\, . . . , x„<— t n } 
is the identity on X \ {xi, . . . , x n } and maps x% G A? on ij G T(.F, Af), for every 
index 1 < i < n. Substitutions can be extended to T(T, X) in such a way that: 

V/ G Jn, Vti, . . . , t n G T(J-, A") a(f(t u . . . , i„)) = /(a(ti), . . . , a(t n )). 

We confuse a substitution and its extension to TÇF,X). Substitutions will 
often be used in postfix notation: ta is the resuit of applying a to the terni t. 



Example 3. Let T = {/(, , ),g(, ),a, b} and X = {2:1,2:2}. Let us consider 
the term t = f(xi,xi,X2). Let us consider the ground substitution a = {2;i<— 
a, 22<— g(b, b)} and the substitution a' = {xi<— X2, 22<— b}. Then 

/ / 

ta = t{xi<—a, 22<— g(b, b)} = // \ ; ta' = t{x\<— 22,22-*— b} = / W 

a a g X2 X2 b 

A 

b b 



Contexts 

Let X n be a set of n variables. A linear term C G T(J 7 , X n ) is called a context 
and the expression C[t\, . . . , t n ] for ti, ■ ■ ■ ,t n G T{T) dénotes the term in T(T) 
obtained from C by replacing variable Xi by ti for each 1 < i < n, that is 
C[t±, . . . , t n ] = C{x\^- ii, . . . ,£„•<— t„}. We dénote by C n (T) the set of contexts 
over (xi, . . . ,£„). 

We dénote by C(.F) the set of contexts containing a single variable. A context 
is trivial if it is reduced to a variable. Given a context C G C(F), we dénote 
by C° the trivial context, C 1 is equal to C and, for n > 1, C™ = C" _1 [C] is a 
context in C(T). 
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Chapter 1 

Recognizable Tree 
Languages and Finite Tree 
Automata 



In this chapter, we présent basic results on finite tree automata in the style of 
the undergraduate textbook on finite automata by Hopcroft and Ullman [HU79] . 
Finite tree automata deal with finite ordered ranked trees or finite ordered trees 
with bounded rank. We discuss unordered and/or unranked finite trees in the 
bibliographie notes (Section 1.9). We assume that the reader is familiar with 
finite automata. Words over a finite alphabet can be viewed as unary ternis. For 
instance a word abb over A = {a, b} can be viewed as a unary term t = a(b(6(jj))) 
over the ranked alphabet T = {a(), &(),$} where jj is a new constant symbol. 
The theory of tree automata arises as a straightforward extension of the theory 
of word automata when words are viewed as unary terms. 

In Section 1.1, we define bottom-up finite tree automata where "bottom-up" 
has the following sensé: assuming a graphical représentation of trees or ground 
terms with the root symbol at the top, an automaton starts its computation at 
the leaves and moves upward. Recognizable tree languages are the languages 
recognized by some finite tree automata. We consider the deterministic case and 
the nondeterministic case and prove the équivalence. In Section 1.2, we prove 
a pumping lemma for recognizable tree languages. This lemma is useful for 
proving that some tree languages are not recognizable. In Section 1.3, we prove 
the basic closure properties for set opérations. In Section 1.4, we define tree 
homomorphisms and study the closure properties under thèse tree transforma- 
tions. In this Section the first différence between the word case and the tree case 
appears. Indeed, ecognizable word languages are closed under homomorphisms 
but recognizable tree languages are closed only under a subclass of tree homo- 
morphisms: linear homomorphisms, where duplication of trees is forbidden. We 
will see ail along this textbook that non linearity is one of the main difficulties 
for the tree case. In Section 1.5, we prove a Myhill-Nerode Theorem for tree 
languages and the existence of a unique minimal automaton. In Section 1.6, we 
define top-down tree automata. A second différence appears with the word case 
because it is proved that deterministic top-down tree automata are strictly less 
powerful than nondeterministic ones. The last section of the présent chapter 
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gives a list of complexity results. 



1.1 Finite Tree Automata 

Nondeterministic Finite Tree Automata 

A Rnite Tree Automaton (NFTA) over T is a tuple A = (Q, T , Qf, A) where 
Q is a set of (unary) states, Qf Ç Q is a set of final states, and A is a set of 
transition rules of the following type: 

f(qi(xi), ..., q n {x n )) -> q(f(xi, ■■■, In)), 

where n > 0, / G ,F n , g, ci, . . . ,ç„ G Q, X\, . . . ,x n G X. 

Tree automata over T run on ground ternis over T . An automaton starts 
at the leaves and moves upward, associating along a run a state with each 
subterm inductively. Let us note that there is no initial state in a NFTA, but, 
when n = 0, i.e.when the symbol is a constant symbol a, a transition rule is 
of the form a — > q(a). Therefore, the transition rules for the constant symbols 
can be considered as the "initial rules". If the direct subterms U\, . . . ,U n of 
t = f(ui, . . . ,M„) are labeled with states qi,... ,q n , then the term t will be 
labeled by some state q with f(qi(xi), . . . , q n (x n )) -^ q{f{xi, ■ ■ ■ , x n )) G A. We 
now formally define the move relation defined by a NFTA. 

Let A = (Q,J-",Qf, A) be a NFTA over T. The move relation ^_4 is 
defined by: let t, t' E T(fUQ), 



t->t' & < 



3C€C(TUQ),3u u ...,u n £T(F), 
3f(qi(xi),...,q n (x n )) -> q(f(xi,...,x n )) G A, 
t = C[f(q 1 (u 1 ),...,q n (u n ))}, 
{t' = C[q(f( Ul ,...,u n ))}. 



— ► is the reflexive and transitive closure of — > a. 
A 



Example 4. Let T = {/(, ),g(),a}. Consider the automaton A = (Q,F, Qf, A) 
defined by: Q = {q a , q g , qf}, Qf = {qf}, and A is the following set of transition 
rules: 

{ a -> q a {a) g(q a {x)) -> q g {g{x)) 

g(q g {x)) -> q g {g{x)) f{q g {x),q g {y)) -> qf(f(x,y)) } 

We give two examples of réductions with the move relation — >^ 



a a 9a a 9a 9a 
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f f I q f 



g A g g A q g q a ï 



a a <ia q a g g g g 



a a a a a a 



A ground term t in T(T) is accepted by a finite tree automaton A = 
(Q,F,Qf,A)i£ 

t A q(t) 
A 

for some state q in Qf. The reader should note that our définition corresponds 
to the notion of nondeterministic finite tree automaton because our finite tree 
automaton model allows zéro, one or more transition rules with the same left- 
hand side. Therefore there are possibly more than one réduction starting with 
the same ground term. And, a ground term t is accepted if there is one réduction 
(among ail possible réductions) starting from this ground term and leading to a 
configuration of the form q(t) where g is a final state. The tree language L(A) 
recognized by A is the set of ail ground terms accepted by A. A set L of ground 
terms is recognizable if L = L(A) for some NFTA A. The reader should also 
note that when we talk about the set recognized by a finite tree automaton A 
we are referring to the spécifie set L(A), not just any set of ground terms ail of 
which happen to be accepted by A. Two NFTA are said to be équivalent if 
they recognize the same tree languages. 



Example 5. Let T = {/(, ), <?(), a}. Consider the automaton A = (Q, T , Qf, A) 
defined by: Q = {q, q 9 ,qf}, Qf = {qf}, and A = 

{ a -> q(a) g(q(x)) -> q(g(x)) 

g(q(x)) -► q g (g(x)) g(q g (x)) -^ qf(g(x)) 

f(q(x),q(y)) -> q(f(x,y)) }. 

We now consider a ground term t and exhibit three différent réductions of 
term t w.r.t.move relation —>a- 



t = g(g(f(g(a),a))) -A g{g{f{q g {g{a)U{a)))) 

t = g(g(f(g(a),a))) -A g(g(q(f(g(a),a)))) A q (t) 

t = g(g(f(g(a),a))) A g(g(q(f(g(a),a)))) A q f (t) 

The term t is accepted by A because of the third réduction. It is easy to 
prove that L(A) = {g{g(t)) \ t G T(jF)} is the set of ground instances of g(g(x)). 



The set of transition rules of a NFTA A can also be defined as a ground 
rewrite System, i.e.a set of ground transition rules of the form: f(q±, . . . , q n ) — > 
q. A move relation — >_4 can be defined as before. The only différence is that, 
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now, we "forget" the ground subterms. And, a term t is accepted by a NFTA 
.4 if 

t-+q 

for some final state q in Qf. Unless it is stated otherwise, we will now refer 
to the définition with a set of ground transition rules. Considering a réduction 
starting from a ground term t and leading to a state q with the move relation, it 
is useful to remember the "history" of the réduction, i.e.to remember in which 
states the ground subterms of i are reduced. For this, we will adopt the following 
définitions. Let t be a ground term and A be a NFTA, a run r of A on t is 
a mapping r : Vos(t) — > Q compatible with A, î.e.for every position p in 
Vos(t), if t(jp) = f G J-'n, r(p) = q, r(pi) = qi for each i G {l,...,n}, then 
/(91, ■ • ■ , q n ) — » 9 G A. A run r of A on t is successful if r(e) is a final state. 
And a ground term t is accepted by a NFTA .4 if there is a successful run r of 
A on i. 

Example 6. Let T = {or(, ),and(, ),not(),Q, 1}. Consider the automaton 
4 = (Q, J", Qf, A) defined by: Q = {<7o,9i}, Q/ = {ci}, and A = 

{ 






— > 


9o 


1 


-» 9i 


not(q ) 


-> 


9i 


not(<7i) 


-» 9o 


and(q Q ,q ) 


-> 


9o 


and(q ,qi) 


-^ 9o 


and(q!,qo) 


->• 


9fi 


and(qi,qi) 


-^ 9i 


or(q ,q Q ) 


-> 


9o 


or(q ,qi) 


-^ 9i 


or(q!,q ) 


-> 


9i 


or{qi,qi) 


-^ 9i 



}■ 

A ground term over T can be viewed as a boolean formula without variable and a 
run on such a ground term can be viewed as the évaluation of the corresponding 
boolean formula. For instance, we give a réduction for a ground term t and the 
corresponding run given as a tree 

and 

/ \ — > ço ; the run r: 
not or A 

I A 

or i not 

A I 

1 

The tree language recognized by A is the set of true boolean expressions over 
T. 



NFTA with e-rules 

Like in the word case, it is convenient to allow e-moves in the réduction of 
a ground term by an automaton, z.e.the current state is changed but no new 
symbol of the term is processed. This is done by introducing a new type of rules 
in the set of transition rules of an automaton. A NFTA with e-rules is like 
a NFTA except that now the set of transition rules contains ground transition 



9o 


9i 




A 


9i 


9i «i 


A 




9o 9i 


9o 
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rules of the form f(q±, . . . , q n ) — > q, and e-rules of the form q — > g'. The ability 
to make e-moves does not allow the NFTA to accept non recognizable sets. But 
NFTA with e-rules are useful in some constructions and simplify some proofs. 

Example 7. Let T = {cons(,), sQ,0,nil}. Consider the automaton A = 
{Q,T,Q f ,A) defined by: Q = {ç Nat , g List , qu s t*}, Qf = {gtist}, and A = 

{ 






-> 


ÇNat 


s(9Nat) 


-> 


<?Nat 


nil 


-> 


qust 


COns(q Nat , QList) 


-► 


<?List : 


qust* 


— > 


«List}- 









The recognized tree language is the set of Lisp-like lists of integers. If the final 
state set Qf is set to {<7i_ist*}, then the recognized tree language is the set of 
non empty Lisp-like lists of integers. The e-rule qust* — * <7i_ist says that a non 
empty list is a list. The reader should recognize the définition of an order-sorted 
algebra with the sorts Nat, List, and List* (which stands for the non empty lists), 
and the inclusion List* Ç List (see Section 3.4.1). 



Theorem 1 (The équivalence of NFTAs with and without e-rules). If 

L is recognized by a NFTA with e-rules, then L is recognized by a NFTA without 
e-rules. 

Proof. Let A = (Q,^, Qf, A) be a NFTA with e-rules. Consider the subset A e 
consisting of those e-rules in A. We dénote by e-closure(q) the set of ail states 
q' in Q such that there is a réduction of q into q' using rules in A e . We consider 
that q G e-closure(q). This computation is a transitive closure computation and 
can be done in 0(|Q| 3 ). Now let us define the NFTA A' = {Q, T, Q f , A') where 
A' is defined by: 

A ' = {/ (<7i, • • • î Qn) -> q' | f(qi, ■ ■ ■ , <Zn) -> q G A, q' G e-closure(q)} 

Then it may be proved that t — > q iff t — > q. □ 

A A' 

Unless it is stated otherwise, we will now consider NFTA without e-rules. 

Deterministic Finite Tree Automata 

Our définition of tree automata corresponds to the notion of nondeterministic 
finite tree automata. We will now define deterministic tree automata (DFTA) 
which are a spécial case of NFTA. It will turn out that, like in the word case, any 
language recognized by a NFTA can also be recognized by a DFTA. However, 
the NFTA are useful in proving theorems in tree language theory. 

A tree automaton A = {Q,T, Qf, A) is deterministic (DFTA) if there are 
no two rules with the same left-hand side (and no e-rule). Given a DFTA, there 
is at most one run for every ground term, i.e.for every ground term t, there is 
at most one state q such that t — > q. The reader should note that it is possible 

to define a tree automaton in which there are two rules with the same left-hand 
side such that there is at most one run for every ground term (see Example 8). 
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It is also useful to consider tree automata such that there is at least one 
run for every ground term. This leads to the following définition. A NFTA A 
is complète if there is at least one rule /(91, . . . , q n ) — > 9 G A for ail n > 0, 
/ G J-'n, and q±, . . . ,q n G Q. Let us note that for a complète DFTA there is 
exactly one run for every ground term. 

Lastly, for practical reasons, it is usual to consider automata in which un- 

necessary states are eliminated. A state q is accessible if there exists a ground 

term t such that t — ► q. A NFTA A is said to be reduced if ail its states are 

A 
accessible. 



Example 8. 

The automaton defined in Example 5 is reduced, not complète, and it is not 
deterministic because there are two rules of left-hand side g(q(x)). Let us also 
note (see Example 5) that at least two runs (one is successful) can be defined 
on the term g(g(f(g(a), a))). 

The automaton defined in Example 6 is a complète and reduced DFTA. 

Let T = {«70, a}. Consider the automaton A = (Q,J-',Qf,A) defined by: 
Q = {901 9ij q}, Q f — {qo}j and A is the following set of transition rules: 

{ 



a 


-> Oo 


9{Oo) ~ 


-> 9i 


9(<li) 


-» Oo 


9(0) ~ 


-> <7o 


9(0) 


- gi}. 







This automaton is not deterministic because there are two rules of left-hand 
side g(q), it is not reduced because state q is not accessible. Nevertheless, one 
should note that there is at most one run for every ground term t. 

Let T = {/(,), g(), a}- Consider the automaton A = (Q, T, Qf, A) defined 
in Example 4 by: Q = {q a ,q g ,qf}, Qf = {qf}, and A is the following set of 
transition rules: 

{ a -» q a g(oa) -> q g 

g(og) -> q g î(q a ^q a ) -> 9/ }• 

This automaton is deterministic and reduced. It is not complète because, for 
instance, there is no transition rule of left-hand side f(q a , qa)- It is easy to define 
a deterministic and complète automaton A' recognizing the same language by 
adding a "dead state". The automaton A' = (Q', J 7 , Qf, A') is defined by: 
Q' = QU{ir}, A' = AU 



{ g(qf) -» *• 5(71") 

f(qa,q a ) -> 7T f(0a,0g) 



■K 
■ïï 
/(7T, 7r) -> 7T }. 



It is easy to generalize the construction given in Example 8 of a complète 
NFTA équivalent to a given NFTA: add a "dead state" 7r and ail transition 
rules with right-hand side 7r such that the automaton is complète. The reader 
should note that this construction could be expensive because it may require 
0(|.F| x |<5j y ^^) new rules where Arity{T) is the maximal arity of symbols 
in T . Therefore we hâve the following: 
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Theorem 2. Let L be a recognizable set of ground terms. Then there exists a 
complète finite tree automaton that accepts L. 

We now give a polynomial algorithm which outputs a reduced NFTA équiv- 
alent to a given NFTA as input. The main loop of this algorithm computes the 
set of accessible states. 



Réduction Algorithm RED 
input: NFTA A = (Q, T, Q f , A) 
begin 

Set Marked to /* Marked is the set of accessible states */ 
repeat 

Set Marked to Marked U {q} 

where 

/ G T n , qi,. .. ,q n G Marked, f(qi,. . . , q„) — >• q G A 
until no state can be added to Marked 
Set Q r to Marked 
Set Q rf to Qf fi Marked 

Set A,,, to {/(gi,.. . ,q n ) -> q G A | q,q lt ...,q n G Marfced} 
output: NFTA A r = (Q^-F, <3, 7 , A r ) 
end 



Obviously ail states in the set Marked are accessible, and an easy induction 
shows that ail accessible states are in the set Marked. And, the NFTA A r 
accepts the tree language L(A). Consequently we hâve: 

Theorem 3. Let L be a recognizable set of ground terms. Then there exists a 
reduced finite tree automaton that accepts L. 

Now, we consider the réduction of nondeterminism. Since every DFTA is 
a NFTA, it is clear that the class of recognizable languages includes the class 
of languages accepted by DFTAs. However it turns out that thèse classes are 
equal. We prove that, for every NFTA, we can construct an équivalent DFTA. 
The proof is similar to the proof of équivalence between DFAs and NFAs in the 
word case. The proof is based on the "subset construction" . Consequently, the 
number of states of the équivalent DFTA can be exponential in the number of 
states of the given NFTA (see Example 10). But, in practice, it often turns 
out that many states are not accessible. Therefore, we will présent in the proof 
of the following theorem a construction of a DFTA where only the accessible 
states are considered, i.e.the given algorithm outputs an équivalent and reduced 
DFTA from a given NFTA as input. 

Theorem 4 (The équivalence of DFTAs and NFTAs). Let L be a recog- 
nizable set of ground terms. Then there exists a DFTA that accepts L. 

Proof. First, we give a theoretical construction of a DFTA équivalent to a 
NFTA. Let A = (Q, F, Q/, A) be a NFTA. Define a DFTA A d = {Q dl F, Qdf, A d ) 
as follows. The states of Qd are ail the subsets of the state set Q of A. That 
is, Qd = 2® . We dénote by s a state of Qd, i.e.s = {qi, . . . , q n } for some states 
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qi, . . . , q n G Q. We define 

/(si,. ..,S„)^SEAj 

iff 
s = {q G Q | 3gi 6 si, . . . , 3g„ G s„, /(gi, ...,g„)->ge A}. 

And Qdf is the set of ail states in Qd containing a final state of A. 

We now give an algorithmic construction where only the accessible states 
are considered. 

Determinization Algorithm DET 
input: NFTA A = (Q, T, Q h A) 
begin 

/* A state s of the équivalent DFTA is in 2 Q */ 

Set Q d to 0; set A d to 

repeat 

Set Q d to Q d U {s}; Set A d to A d U {/(si, . . . , s„) -> s} 

where 

J t -/"ni ^1, • ■ ■ ; S n tz v^/^, 

s = {q G Q | 3qi G Si,. .. ,q„ G S„,/(gi,.. . ,g„) -> g G A} 
until no rule can be added to A^ 
Set Qd/ to {s G Q d | s n Q f + 0} 
output: DFTA A d = {Qd,F,Qdf, A d ) 
end 

It is immédiate from the définition of the determinization algorithm that 
Ad is a deterministic and reduced tree automaton. In order to prove that 
L(A) = L(Ad), we now prove that: 

{t _f_ 8) iff (s = {ç e q 1 1 ju q}} . 

Ad A 

The proof is an easy induction on the structure of ternis. 

• base case: let us consider t = a G T§. Then, there is only one rule o — > s 
in Ad where s = {g G Q | a — > g G A}. 

• induction step: let us consider a term t = /(ii, . . . , t n ). 

— First, let us suppose that t > /(si, ■ ■ • , s n ) — >^ d s. By induction 

-Ad 

hypothesis, for each i G {1, . . . , n\, Sj = {g G Q I ij — ► q}. States Si 

A 

are in Qd, thus a rule f(s±, . . . , s„) — > s G A^ is added in the set A^ 
by the determinization algorithm and s = {q G Q \ 3gi G si, . . . , q n G 

s n , f(q lt ..., q n ) -> g G A}. Thus, s = {q G Q | i A g}. 

.4 

— Second, let us consider s = (g G Q I t = /(£i, ■ ■ ■ ,£«.) — ► g}- Let 

.4 

us consider the state sets Si defined by s, = {g G Q \ U — ► g}. 

By induction hypothesis, for each i G {1, ...,n}, ij » s». Thus 

Ad 
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s = {geQ | 3?i 6 8i,...,g„ 6 s n ,f(qi,...,q n ) -> g G A}. The 
rule /(si, . . . , s„) G A^ by définition of the state set A^ in the deter- 

minization algorithm and t ► s. 

■Ad 

u 



Example 9. Let T = {/(, ), g(), a}. Consider the automaton A = (Q, T , Qf, A) 
defined in Example 5 by: Q = {q,q g ,qf}, Qf = {qf}, and A = 



a - 


-> 9 9(q) - 


■* 9 


9{q) 


■* c lg 9{q g ) - 


■* If 


f(q,q) - 


- q }• 





{ 



Given A as input, the determinization algorithm outputs the DFTA Ad = 
{Qd,F,Qdf,A d ) defined by: Q d = {{q}, {q, q g }, {q, q g , q f }}, Q d f = {{q,q g ,q f }}, 
and Ad = 

{ a - {g} 

S(M) -> {?>9g} 

s({9,9 5 }) -> {9, 9s, 9/} 

9({9,9 S ,9/}) -> {9,9 S ,9/} } 

U { /(si,s 2 ) -> {9} I si,s 2 GQ d }■ 



We now give an example where an exponential blow-up occurs in the deter- 
minization process. This example is the same used in the word case. 

Example 10. Let T = {/(), g(), a} and let n be an integer. And let us consider 
the tree language 

L = {t G T(T) | the symbol at position 1 ... 1 is /}. 



Let us consider the NFTA A = (Q,F, Qf, A) defined by: Q = {q,q\, . . . ,q n +\}, 
Q f = {q n +i}, and A = 



{ a - 


■* 9 


/(?) 


-> 9 


5(9) 


■* 9 


/(?) 


-> 9i 


5(9i) - 


■* 92 


/(9i) 


-> 92 



ff(?n) -> 9™+l f{q n ) -> 9n+l }■ 

The NFTA .4 = {Q,T,Qf,/S) accepts the tree language L, and it has n + 2 
states. Using the subset construction, the équivalent DFTA Ad has 2 n+l states. 
Any équivalent automaton has to memorize the n + 1 last symbols of the input 
tree. Therefore, it can be proved that any DFTA accepting L has at least 2™ +1 
states. It could also be proved that the automaton Ad is minimal in the number 
of states (minimal tree automata are defined in Section 1.5). 
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If a finite tree automaton is deterministic, we can replace the transition 
relation A by a transition function 5. Therefore, it is sometimes convenient to 
consider a DFTA A = {Q,F, Qf, 5) where 

6:{jF n xQ n ^Q. 

n 

The computation of such an automaton on a term t as input tree can be viewed 
as an évaluation of t on finite domain Q. Indeed, define the labeling function 
S : TÇF) — > Q inductively by 

ô(f(t 1 ,...,t n )) = ô(f,ô(t 1 ),...,ô(t n )). 

We shall for convenience confuse ô and ô. 

We now make clear the connections between our définitions and the language 
theoretical définitions of tree automata and of recognizable tree languages. In- 
deed, the reader should note that a complète DFTA is just a finite .F-algebra 
A consisting of a finite carrier \A\ = Q and a distinguished n-ary function 
/ : Q n — ► Q for each n-ary symbol / G T together with a specified subset Qf 
of Q. A ground term t is accepted by A if ô(t) = q g Qf where ô is the unique 
.F-algebra homomorphism ô : T(T) — » A. 

Example 11. Let T = {/(,), a} and consider the .F-algebra A with |*4| = 
Q = Z2 = {0, 1}, / = + where the sum is formed modulo 2, a = 1, and let 
Qf = {0}. A and Qf defines a DFTA. The recognized tree language is the set 
of ground terms over T with an even number of leaves. 

Since DFTA and NFTA accept the same sets of tree languages, we shall not 
distinguish between them unless it becomes necessary, but shall simply refer to 
both as tree automata (FTA). 

1.2 The Pumping Lemma for Recognizable Tree 
Languages 

We now give an example of a tree language which is not recognizable. 

Example 12. Let T = {/(, ),<?(), a}. Let us consider the tree language 
L = {/ '(g 1 (a) , g' (a)) | i > 0}. Let us suppose that L is recognizable by an 
automaton A having k states. Now, consider the term t = f (g k (a) , g k (a)) . t 
belongs to L, therefore there is a successful run of A on t. As k is the cardinality 
of the state set, there are two distinct positions along the first branch of the term 
labeled with the same state. Therefore, one could eut the first branch between 
thèse two positions leading to a term t' = f (gi (a) , g k (a)) with j < k such that 
a successful run of A can be defined on t' . This leads to a contradiction with 
L{A) = L. 

This (sketch of) proof can be generalized by proving a pumping lemma 
for recognizable tree languages. This lemma is extremely useful in proving that 
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certain sets of ground terms are not recognizable. It is also useful for solving 
décision problems like emptiness and finiteness of a recognizable tree language 
(see Section 1.7). 

Pumping Lemma. Let L be a recognizable set of ground terms. Then, there 
exists a constant k > satisfying: for every ground term t in L such that 
Tieight(t) > k, there exist a context C G C(F), a non trivial context C G C{T), 
and a ground term u such that t = C[C'[u\] and, for ail n > C[C" [u\] G L. 

Proof Let A = (Q, T, Q f , A) be a FTA such that L = L(A) and let k = \Q\ 
be the cardinality of the state set Q. Let us consider a ground term t in L 
such that 7ieight(t) > k and consider a successful run r of A on t. Now let us 
consider a path in t of length strictly greater than k. As k is defined to be the 
cardinality of the state set Q, there are two positions p\ < pi along this path 
such that r(pi) = r(p2) = q for some state q. Let u be the ground subterm of t 
at position pi. Let u 1 be the ground subterm of t at position pi, there exists a 
non-trivial context C" such that u' = C [u] . Now define the context C such that 
t = C[C'[u]]. Consider a term C[C [u]] for some integer n > 1, a successful run 
can be defined on this term. Indeed suppose that r corresponds to the réduction 
t — ► qf where qf is a final state of A, then we hâve: 

C[C' n [u\] -^ C[C' n [q}} ^ C\CT-\q]\... ^ C[q] -A q f . 
The same holds when n = 0. □ 



Example 13. Let T = {/(,), a}. Let us consider the tree language L = {t G 
T{T) | \Vos(t)\ is a prime number}. We can prove that L is not recognizable. 
For ail k > 0, consider a term t in L whose height is greater than k. For ail 
contexts C, non trivial contexts C", and terms u such that t = C[C'[u]], there 
exists n such that C[C' n [u]] L. 

From the Pumping Lemma, we dérive conditions for emptiness and finiteness 
given by the following corollary: 

Corollary 1. Let A = (Q,F, Qf,A) be a FTA. Then L(A) is non empty if 
and only if there exists a term t in L(A) with 7ieight(t) < \Q\. Then L(A) is 
infinité if and only if there exists a term t in L(A) with \Q\ < Tieight(t) < 2\Q\. 

1.3 Closure Properties of Recognizable Tree Lan- 
guages 

A closure property of a class of (tree) languages is the fact that the class 
is closed under a particular opération. We are interested in effective closure 
properties where, given représentations for languages in the class, there is an 
algorithm to construct a représentation for the language that results by applying 
the opération to thèse languages. Let us note that the équivalence between 
NFTA and DFTA is effective, thus we may choose the représentation that suits 
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us best. Nevertheless, the determinization algorithm may output a DFTA whose 
number of states is exponential in the number of states of the given NFTA. 
For the différent closure properties, we give effective constructions and we give 
the properties of the resulting FTA depending on the properties of the given 
FTA as input. In this section, we consider the Boolean set opérations: union, 
intersection, and complémentation. Other opérations will be studied in the next 
sections. Complexity results are given in Section 1.7. 

Theorem 5. The class of recognizable tree languages is closed under union, 
under complémentation, and under intersection. 

Union 

Let L\ and L 2 be two recognizable tree languages. Thus there are tree au- 
tomata A\ = {Qi,F, Qfi, Ai) and A 2 = (Q 2 ,T, Qf 2 , A 2 ) with L\ = L{A\) 
and L 2 = L(A 2 )- Since we may rename states of a tree automaton, without 
loss of generality, we may suppose that Q\ fl Q 2 = 0- Now, let us consider 
the FTA A = (Q,F,Q f ,A) defmed by: Q = Q x U Q 2 , Qf = Qf\ U Q f2 , and 
A = A1UA2. The equality between L(A) and L(Ai)UL(A 2 ) is straightforward. 
Let us note that A is nondeterministic and not complète, even if Ai and A2 are 
deterministic and complète. 

We now give another construction which préserves determinism. The intu- 
itive idea is to process in parallel a term by the two automata. For this we 
consider a product automaton. Let us suppose that A\ and A 2 are complète. 
And, let us consider the FTA A = (Q,F, Qf, A) defmed by: Q = Qx x Q 2 , 
Qf = Qfi x Qi U Q\ x Q/2, and A = Ai x A 2 where 

Ai x A 2 = {/((ci, gi), . . • , (<?„, q' n )) - (q, q') \ 

f(q u ...,ft,)-»îeAi f(q\, . . . , q' n ) -» q' G A 2 } 



The proof of the equality between L(A) and L(A\) yjL(A 2 ) is left to the reader, 
but the reader should note that the hypothesis that the two given tree automata 
are complète is crucial in the proof. Indeed, suppose for instance that a ground 
term t is accepted by A\ but not by A 2 . Moreover suppose that A 2 is not 
complète and that there is no run of A 2 on t, then the product automaton does 
not accept t because there is no run of the product automaton on t. The reader 
should also note that the construction préserves determinism, i.e.if the two given 
automata are deterministic, then the product automaton is also deterministic. 

Complémentation 

Let L be a recognizable tree language. Let A = {Q,T,Qf,/S) be a complète 
DFTA such that L(A) = L. Now, complément the final state set to recognize 
the complément of L. That is, let A c = {Q,T, Q c t, A) with Q c r = Q\ Qf, the 
DFTA A c recognizes the complément of set L in T(T). 

If the input automaton A is a NFTA, then first apply the determinization 
algorithm, and second complément the final state set. This could lead to an 
exponential blow-up. 
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Intersection 

Closure under intersection follows from closure under union and complémenta- 
tion because 



LiH £ 2 =L X UL 2 . 

where we dénote by L the complément of set L in T(T). But if the recogniz- 
able tree languages are defined by NFTA, we hâve to use the complémenta- 
tion construction, therefore the determinization process is used leading to an 
cxponential blow-up. Consequently, we now give a direct construction which 
does not use the determinization algorithm. Let A\ = (Qi,F,Qfi,Ai) and 
A 2 = (Q2, F, Qf2, A 2 ) be FTA such that L{A\) = L x and L(A 2 ) = L 2 . And, 
consider the FTA A = (Q, T, Q f , A) defined by: Q = Q 1 x Q 2 , Q f = Q fl x Q f2 , 
and A = Ai x A 2 . A recognizes L\ H L 2 . Moreover the reader should note that 
A is deterministic if A\ and A 2 are deterministic. 



1.4 Tree Homomorphisms 

We now consider tree transformations and study the closure properties under 
thèse tree transformations. In this section we are interested with tree transfor- 
mations preserving the structure of trees. Thus, we restrict ourselves to tree 
homomorphisms. Tree homomorphisms are a generalization of homomorphisms 
for words (considered as unary ternis) to the case of arbitrary ranked alpha- 
bets. In the word case, it is known that the class of regular sets is closed 
under homomorphisms and inverse homomorphisms. The situation is différent 
in the tree case because whereas recognizable tree languages are closed under 
inverse homomorphisms, they are closed only under a subclass of homomor- 
phisms, i.e.linear homomorphisms (duplication of terms is forbidden). First, we 
define tree homomorphisms. 

Let T and T' be two sets of function symbols, possibly not disjoint. For 
each n > such that T contains a symbol of arity n, we define a set of variables 
X n = {xi, . . . , x n } disjoint from T and T' . 

Let hjr be a mapping which, with / G T of arity n, associâtes a term 
t f e T(P,X n ). The tree homomorphism h : TÇF) -» T(T') determined by 
hjr is defined as follows: 

• h(a) = t a G T(T') for each a G T of arity 0, 

• h{f(ti, . . . , t n )) = t f {xi <- h(ti), ...,x n *- h(t n )} 

where tf{x\ «— h(t\), . . . , x n <— h(t n )} is the resuit of applying the substi- 
tution {x\ <— h(ti), . . . ,x n <— h(t n )} to the term tf. 



Example 14. Let T = {g(, , ), a, b} and T' = {/(, ), a, b}. Let us consider the 
tree homomorphism h determined by hp defined by: hjr(g) = /(xi, f(x 2 , X3)), 
hjr^a) = a and hjr(b) = b. For instance, we hâve: 
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b b b 



The homomorphism h defines a transformation from ternary trees into binary 
trees. 

Let us now consider T = {and{, ), or(, ), not(), 0, 1} and T' = {or(, ), not(), 0, 1}. 
Let us consider the tree homomorphism h determined by hjr defined by: hj^{and) = 
not(or(not(x\) , not{x2)), and hj: is the identity otherwise. This homomorphism 
transforms a boolean formula in an équivalent boolean formula which does not 
contain the function symbol and. 

A tree homomorphism is linear if for each / G T of arity n, hjr(f) = tf is 
a linear term in T(T', X n ). The following example shows that tree homomor- 
phisms do not always préserve recognizability. 



Example 15. Let T = {/(),<?(), a} and T' = {f'(,),g(),a}. Let us consider 
the tree homomorphism h determined by hjr defined by: hjr(f) = f'(xi,xi), 
^(ff) = 9i x i)i an d hjr(a) = a. h is not linear. Let L = {f(g l (a)) \ i > 0}, 
then L is a recognizable tree language. h{L) = {f'(g l (a), g l (a)) \ i > 0} is not 
recognizable (see Example 12). 



Theorem 6 (Linear homomorphisms préserve recognizability). Let h 

be a linear tree homomorphism, and L be a recognizable tree language, then h{L) 
is a recognizable tree language. 

Proof. Let L be a recognizable tree language. Let A = (Q, T, Qf, A) be a 
reduced DFTA such that L(A) = L. Let /ibea linear tree homomorphism from 
T{JF) into T{T') determined by a mapping hf. 

First, let us define a NFTA A' = (Q', T' , Q' f , A'). Let us consider a rule r = 
f(qi, . . . , q n ) — > g in A and consider the linear term tf = hjr(f) G T(T' ', X n ) and 
the set of positions Vos(tf). We define a set of states Q r = {ql, \ p G Vos(tf)}, 
and we define a set of rules A r as follows: for ail positions p in Vos(tf) 

• if tf(p) = g G T' k , then g(q r pi , . . . ,q r pk ) -s- q r p G A r , 

• if tf{p) = Xi, then q t — > q p G A r , 

• il -* 1 € A,.. 

The preceding construction is made for each rule in A. We suppose that ail the 
state sets Q r are disjoint and that they are disjoint from Q. Now define A' by: 

• Q' = QuU reA Q r , 
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• Q', = Q/, 

. A' = U ? . 6A A r . 

Second, we hâve to prove that h(L) = L(A'). 

h(L) Ç L(A'). We prove that if t — > q then hit) — ► q by induction on the 

A A' 

length of the réduction of ground term t G T(T) by automaton A. 

• Base case. Suppose that t— >aQ- Then t = a G Tç, and a — ► 9 G A. 
Then there is a réduction /i(a) = i a — ► g using the rules in the set 

• Induction step. 

Suppose that t = f(u\, . . . , u n ), then hit) = tf{xi*—h(u-\), . . . , x n <— 

hiu n )\. Moreover suppose that t — > /(<7i, ■ ■ ■ , q n ) —M <?• By induc- 

A 

tion hypothesis, we hâve h(ui) — » qi, for each i in {1, . . . , n}. Then 

there is a réduction ty{iEi<— gi, . . . ,x„<— g n } — ► g using the rules in 
the set A/( gi] ... >9n )^g. 

/i(L) D L(^4')- We P rove that lit' ^ q e Q then t' = h(t) with i A g for 
some i G TÇF). The proof is by induction on the number of states in Q 

occurring along the réduction t' — ► q Ci Q. 

A' 

• Base case. Suppose that t' — > q G Q and no state in Q apart from q 

A' 

occurs in the réduction. Then, because the state sets Q r are disjoint, 
only rules of some A r can be used in the réduction. Thus, t' is ground, 
t' = hjr{f) for some symbol / G J 7 , and r = f(qi, . . . ,q n ) — > q. 
Because the automaton is reduced, there is some ground term t with 
Tiead{t) = f such that t' = h(t) and t — > q. 

• Induction step. Suppose that 

t —> v{x 1 ^qi,...,x m ^q m j —> q 

A' A' 

where v is alinear terminX^.? 7 ', {x[, . . . ,x' m }), t' = v{x[^- u[, . . . ,x' m ^- 
M m}j u 'i — > Ci £ Q, and no state in Q apart from q occurs in the 

réduction of v{x' x <— q\,...,x' m <— q m } to q. The reader should 
note that différent variables can be substituted by the same state. 
Then, because the state sets Q r are disjoint, only rules of some 
A r can be used in the réduction of v{x'-y*— q\, . ■ . ,x' m <— q m } to q. 
Thus, there exists some linear term tf such that v{x'i<—qi, . . . , x' m <^- 
Qm} = tf{xi <— qi,...,x n *- q n } for some symbol / G T n and 
r = /(Çlj ■ • • !<7n) —> 1 G A. By induction hypothesis, there are 
terms u\, . . . ,u m in L such that u' t = h(ui) and Ui — > qi for each 



i in {1, . . . , m}. Now consider the term t = f{v\, . . . , v n ), where 
vi = Ui if Xi occurs in tf and Vi is some term such that m — » q% 
otherwise (terms Vi always exist because A is reduced). We hâve 
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Ht) = tf{xi<—h(vi),.. .,x n <—h(v n )}, Ht) = v{x[^h(ui),. .. ,x' m <- 
h(u m )}, h(t) = t'. Moreover, by définition of the Vi and by induc- 
tion hypothesis, we hâve t — ► q. Note that if qi occurs more than 

A 
once, you can substitute qi by any terni satisfying the conditions. 

The proof does not work for the non linear case because you hâve to 
check that différent occurrences of some state qi corresponding to the 
same variable Xj G Var(tf) can only be substituted by equal ternis. 

□ 

Only linear tree homomorphisms préserve recognizability. An example of a 
non linear homomorphism which transforms recognizable tree languages either 
in recognizable tree languages or in non recognizable tree languages is given in 
Exercise 6. For linear and non linear homomorphisms, we hâve: 

Theorem 7 (Inverse homomorphisms préserve recognizability). Let h 

be a tree homomorphism and L be a recognizable tree language, then h~ l (L) is 
a recognizable tree language. 

Proof. Let h be a tree homomorphism from T(T) into T(T') determined by a 
mapping h T . Let A! = {Q' , J 7 ', Q' f , A') be a complète DFTA such that L(A') = 
L. We define a DFTA A = (Q, JF,Q/,A) by Q = Q' U {s} where s <£ Q', 
Qf = Q'r and A is defined by the following: 

• for a G .T-o j if ta — > Ç then a — > q G A; 

A' 

• for/ G T n where n > 0, if t f {xi^ pi, . . . ,x n <— p n } ~^> q then/(<?i, . . . ,q n ) 
q G A where qi = pi if Xi occurs in tf and qi = s otherwise; 

• for a G ^07 « ^ s £ A; 

• for / G T n where n > 0, f(s, . . . , s) — » s G A. 

The rule set A is computable. The proof of the équivalence t — > q if and only 

A 

if h(t) — > q is left to the reader. □ 

A' 

It can be proved that the class of recognizable tree languages is the smallest 
non trivial class of tree languages closed by linear tree homomorphisms and 
inverse tree homomorphisms. Tree homomorphisms do not in gênerai préserve 
recognizability, therefore let us consider the following problem: given as in- 
stance a recognizable tree language L and a tree homomorphism h, is the set 
h(L) recognizable? To our knowledge it is not known whether this problem is 
decidable. The reader should note that if this problem is decidable, the prob- 
lem whether the set of normal forms of a rewrite System is recognizable is easily 
shown decidable (see Exercises 6 and 12). 

As a conclusion we consider différent spécial types of tree homomorphisms. 
Thèse homomorphisms will be used in the next sections in order to simplify 
some proofs and will be useful in Chapter 6. Let h be a tree homomorphism 
determined by hjr. The tree homomorphism h is said to be: 

• e-free if for each symbol / G T ^ £/ is not reduced to a variable. 
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• symbol to symbol if for each symbol /Gf, 7ieight(tf) = 1. The reader 
should note that with our définitions a symbol to symbol tree homomor- 
phism is e-free. A linear symbol to symbol tree homomorphism changes 
the label of the input symbol, possibly erases some subtrees and possibly 
modifies order of subtrees. 

• complète if for each symbol / G T n , Var(tf) = X n . 

• a delabeling if h is a complète, linear, symbol to symbol tree homomor- 
phism. Such a delabeling only changes the label of the input symbol and 
possibly order of subtrees. 

• alphabetic if for each symbol / G T n , tf = g(x\, . . . , x n ), where g G T' n . 

As a corollary of Theorem 6, alphabetic tree homomorphisms, delabelings and 
linear, symbol to symbol tree homomorphisms préserve recognizability. It can 
be proved that for thèse classes of tree homomorphisms, given h and a FTA A 
such that L(A) = L as instance, a FTA for the recognizable tree language h(L) 
can be constructed in linear time. The same holds for /i -1 (L). 

Example 16. Let T = {/(, ), <?(), a} and T' = {/'(, ), </(), a'}- Let us consider 
some tree homomorphisms h determined by différent hj:. 

• hjr(f) = xi, hjr(g) = f'(xi,Xi), and hjr(a) = a', h is not linear, not 
e-free, and not complète. 

• hjr(f) = g'(xi), hjr(g) = f'(xi,Xi), and hjr(a) = a', h is a non linear 
symbol to symbol tree homomorphism. h is not complète. 

• hp(f) = f / ( x 2,Xi), hjr(g) = g'(xi), and hj^(a) = a', h is a delabeling. 

• hjr^f) = f'(xi,X2), hjr(g) = g'(xi), and hjr(a) = a', h is an alphabetic 
tree homomorphism. 



1.5 Minimizing Tree Automata 

In this section, we prove that, like in the word case, there exists a unique minimal 
automaton in the number of states for a given recognizable tree language. 

A Myhill-Nerode Theorem for Tree Languages 

The Myhill-Nerode Theorem is a classical resuit in the theory of finite au- 
tomata. This theorem gives a characterization of the recognizable sets and it 
has numerous applications. A conséquence of this theorem, among other con- 
séquences, is that there is essentially a unique minimum state DFA for every 
recognizable language over finite alphabet. The Myhill-Nerode Theorem gener- 
alizes in a straightforward way to automata on finite trees. 

An équivalence relation = on T(T) is a congruence on T(T) if for every 

fefn 

u z = Vi 1 < i < n =^ /(«i,...,W„) = f(vi,...,v n ) . 
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It is of finite index if there are only finitely many =-classes. Equivalently 
a congruence is an équivalence relation closed under context, i.e.for ail contexts 
C G C{T)i if u = v, then C[u] = C[v\. For a given tree language L, let us define 
the congruence =l on TÇF) by: u =l v if for ail contexts C G C{T), 

C[u] eiifi C[«] e L. 

We are now ready to give the Theorem: 
Myhill-Nerode Theorem. The following three statements are équivalent: 
(i) L is a recognizable tree language 

(ii) L is the union of some équivalence classes of a congruence of finite index 
(iii) the relation =l is a congruence of finite index. 
Proofi 

• (i) => (ii) Assume that L is recognized by some complète DFTA A = 
(Q,F,Qf,ô). We consider 5 as a transition function. Let us consider 
the relation =j^ defined on T{T) by: u =a v if 5(u) = ô(v). Clearly 
=_4 is a congruence relation and it is of finite index, since the number of 
équivalence classes is at most the number of states in Q. Furthermore, L 
is the union of those équivalence classes that include a term u such that 
S(u) is a final state. 

• (ii) =>• (iii) Let us dénote by ~ the congruence of finite index. And let us 
assume that u ~ v. By an easy induction on the structure of terms, it can 
be proved that C[u] ~ C[v] for ail contexts C G C{T). Now, L is the union 
of some équivalence classes of ~, thus we hâve C[u] G L iff C[v] G L. Thus 
u =l v, and the équivalence class of u in ~ is contained in the équivalence 
class of m in =^. Consequently, the index of =l is lower than or equal to 
the index of ~ which is finite. 

• (iii) => (i) Let Q m in be the finite set of équivalence classes of =l. And 
let us dénote by [u] the équivalence class of a term u. Let the transition 
function ô m i n be defined by: 

Smin(f, Mi • • • i M) = [f(ui, ■■■, U n )]. 

The définition of 5 m i n is consistent because =l is a congruence. And 

let Icjniirtf — X [Uf\ | U G ^j- -L ne Dr 1A *r\-rnin — [ycmim ** i ^5 min ty Omin) 

recognizes the tree language L. 

□ 

As a corollary of the Myhill-Nerode Theorem, we can deduce an other al- 
gebraic characterization of recognizable tree languages. This characterization 
is a reformulation of the définition of recognizability. A set of ground terms 
L is recognizable if and only if there exist a finite .F-algebra A, an jF-algebra 
homomorphism <f) : T(T) — ► A and a subset A' of the carrier |«4| of A such 
that L = (j)- 1 (A / ). 
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Minimization of Tree Automata 

First, we prove the existence and uniqueness of the minimum DFTA for a rec- 
ognizable tree language. It is a conséquence of the Myhill-Nerode Theorem 
because of the following resuit: 

Corollary 2. The minimum DFTA recognizing a recognizable tree language L 
is unique up to a renaming of the states and is given by A m in "in the proof of 
the Myhill-Nerode Theorem. 

Proof. Assume that L is recognized by some DFTA A = (Q,T,Qf,5). The 
relation =a is a refinement of =l (see the proof of the Myhill-Nerode Theorem). 
Therefore the number of states of A is greater than or equal to the number of 
states of A m in- If equality holds, A is reduced, i.e.all states are accessible, 
because otherwise a state could be removed leading to a contradiction. Let q 
be a state in Q and let u be such that ô(u) = q. The state q can be identified 
with the state ô m in(u). This identification is consistent and defines a one to one 
correspondence between Q and Q m in- d 

Second, we give a minimization algorithm for finding the minimum state 
DFTA équivalent to a given reduced DFTA. We identify an équivalence relation 
and the séquence of its équivalence classes. 



Minimization Algorithm MIN 

input: complète and reduced DFTA A = (Q, F, Qf, ô) 
begin 

Set P to {Qf, Q — Qf} /* P is the initial équivalence relation*/ 
repeat 
P' = P 

/* Refine équivalence P in P' * / 
qP'q 1 if 

qPq' and 

V/ G F n Vq\, ■ ■■, qi-i,q i+ i, ...,q„€Q 

à(f(qi, ■■■, q%-i,Q, Qi+i, ■■■, q n ))PS(f(qi, . . . , qt-i, q', ft+i, . . . , q n )) 
until P' = P 

Set Qmin to the set of équivalence classes of P 
/* we dénote by [q] the équivalence class of state q w.r.t.P * / 

Set Ô min to {(/, [<7i], . . . , [q n ]) -> [/(ci, . . . ,<?„)]} 
Set Qmin t to {[q] | q G Q f } 

OUtpUt: Dr 1A J^vriin — \Sl min 5 •' iW min f 5 Qmin) 

end 



The DFTA constructed by the algorithm AiTM is the minimum state DFTA 
for its tree language. Indeed, let A = (Q,^, Qf, A) the DFTA to which is ap- 
plied the algorithm and let L = L(A). Let A m i n be the output of the algorithm. 
It is easy to show that the définition of A m in is consistent and that L = L(A m i n ). 
Now, by contradiction, we can prove that A rn in has no more states than the 
number of équivalence classes of =l ■ 
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1.6 Top Down Tree Automata 

The tree automata that we hâve defined in the previous sections are also known 
as bottom-up tree automata because thèse automata start their computation at 
the leaves of trees. In this section we define top-down tree automata. Such an 
automaton starts its computation at the root in an initial state and then simul- 
taneously works down the paths of the tree level by level. The tree automaton 
accepts a tree if a run built up in this fashion can be defined. It appears that 
top-down tree automata and bottom-up tree automata hâve the same expres- 
sive power. An important différence between bottom-up tree automata and 
top-down automata appears in the question of determinism since deterministic 
top-down tree automata are strictly less powerful than nondeterministic ones 
and therefore are strictly less powerful than bottom-up tree automata. In- 
tuitively, it is due to the following: tree properties specified by deterministic 
top-down tree automata can dépend only on path properties. We now make 
précise thèse remarks, but first formally define top-down tree automata. 

A nondeterministic top-down finite Tree Automaton (top-down NFTA) 
over T is a tuple A = (Q, T, I, A) where Q is a set of states (states are unary 
symbols), / Ç Q is a set of initial states, and A is a set of rewrite rules of the 
following type : 

q(f(xi,...,x n )) -> f(qi(xi),...,q n (x n )), 

where n > 0, / G T n , q, qi, . . . ,q n G Q, X\, . . . ,x n G X. 

When n = 0, i.e.when the symbol is a constant symbol a, a transition rule of 
top-down NFTA is of the form q(a) — > a. A top-down automaton starts at the 
root and moves downward, associating along a run a state with each subterm 
inductively. We do not formally define the move relation -^^ defined by a top- 
down NFTA because the définition is easily deduced from the corresponding 
définition for bottom-up NFTA. The tree language L(A) recognized by A is the 
set of ail ground terms t for which there is an initial state q m I such that 

q(t) A t. 

The expressive power of bottom-up and top-down tree automata is the same. 
Indeed, we hâve the following Theorem: 

Theorem 8 (The équivalence of top-down and bottom-up NFTAs). 
The class of languages accepted by top-down NFTAs is exactly the class of rec- 
ognizable tree languages. 

Proof. The proof is left to the reader. Hint. Reverse the arrows and exchange 
the sets of initial and final states. □ 

Top-down and bottom-up tree automata hâve the same expressive power 
because they define the same classes of tree languages. Nevertheless they do 
not hâve the same behavior from an algorithmic point of view because nondc- 
tcrminism can not be reduced in the class of top-down tree automata. 

Proposition 1 (Top-down NFTAs and top-down DFTAs). A top-down 
finite Tree Automaton [Q,T 1 1, A) is deterministic (top-down DFTA) if there is 
one initial state and no two rules with the same left-hand side. Top-down DFTAs 
are strictly less powerful than top-down NFTAs, i.e. there exists a recognizable 
tree language which is not accepted by a top-down DFTA. 
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Proof. Let T = {/(,), a, b}. And let us consider the recognizable tree language 
T = {/(a, &),/(&, a)}. Now let us suppose there exists a top-down DFTA that 
accepts T, the automaton should accept the terni /(a, a) leading to a contra- 
diction. Obviously the tree language T = {/(a,6), /(&, a)} is recognizable by a 
finite union of top-down DFTA but there is a recognizable tree language which 
is not accepted by a finite union of top-down DFTA (see Exercise 2). □ 

1.7 Décision Problems and their Complexity 

In this section, we study some décision problems and their complexity. The size 
of an automaton will be the size of its représentation. More formally: 

Définition 1. Let A = (Q,^ 7 , Q/, A) be a NFTA over T. The size of a rule 
f(qi(xx),. ..,q„(x n )) -> q(f(xi,...,x n )) is arity(f) + 2. The size of A noted 
\\A\\, is defined by: 

\\A\\ = \Q\+ J2 (arity(f) + 2). 

/( ?1 (n),.,j„(i»))^(/(ji,...,ij)eA 

We will work in the frame of RAM machines, with uniform measure. 

Membership 

Instance A ground term. 

Answer "yes" if and only if the term is recognized by a given automaton. 

Let us first remark that, in our model, for a given deterministic automaton, 
a run on a tree can be computed in 0(||i||). The complexity of the problem is: 

Theorem 9. The membership problem is ALOGTIME-complete. 

Uniform Membership 

Instance A tree automaton and a ground term. 

Answer "yes" if and only if the term is recognized by the given automaton. 

Theorem 10. The uniform membership problem can be decided in linear time 
for DFTA, in polynomial time for NFTA. 

Proof. In the deterministic case, from a term t and the automaton ||«4||, we can 
compute a run in 0(||£|| + 1|.4||). In the nondeterministic case, the idea is similar 
to the word case: the algorithm determinizes along the computation, z.e.for each 
node of the term, we compute the set of reached states. The complexity of this 
algorithm will be in 0(\\t\\ x ||*4||). 

□ 

The uniform membership problem has been proved LOGSPACE-complete 
for deterministic top-down tree automata, LOGCFL-complete for NFTA under 
log-space réductions. For DFTA, it has been proven LOGDCFL, but the précise 
complexity remains open. 
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Emptiness 

Instance A tree automaton 

Answer "yes" if and only if the recognized language is empty. 

Theorem 11. It can be decided in linear time whether the language accepted 
by a finite tree automaton is empty. 

Proof. The minimal height of accepted terms can be bounded by the number of 
states using Corollary 1; so, as membership is decidable, emptiness is decidable. 
Of course, this approach does not provide a practicable algorithm. To get an 
efficient algorithm, it sufnces to notice that a NFTA accepts at least one tree 
if and only if there is an accessible final state. In other words, the language 
recognized by a reduced automaton is empty if and only if the set of final 
states is non empty. Reducing an automaton can be done in 0(|Q| x ||*4||) 
by the réduction algorithm given in Section 1.1. Actually, this algorithm can 
be improved by choosing an adéquate data structure in order to get a linear 
algorithm (see Exercise 17). This linear least fixpoint computation holds in 
several frameworks. For example, it can be viewed as the satisfiability test of 
a set of propositional Horn formulae. The réduction is easy and linear: each 
state q can be associated with a propositional variable X q and each rule r : 
/(<7i, ■ • ■ , <?n) ~~ y Q can be associated with a propositional Horn formula F r = 
X q W^X qi V- ■ -\/-tXq n . It is straightforward that satisfiability of {F r }u{->X q /q G 
Qj} is équivalent to emptiness of the language recognized by (Q, T , Qf, A). So, 
as satisfiability of a set of propositional Horn formulae can be decided in linear 
time, we get a linear algorithm for testing emptiness for NFTA. □ 

The emptiness problem is P-complete with respect to logspace réductions, 
even when restricted to deterministic tree automata. The proof can easily be 
done since the problem is very close to the solvable path Systems problem which 
is known to be P-complete (see Exercise 18). 

Intersection non-emptiness 

Instance A finite séquence of tree automata. 

Answer "yes" if and only if there is at least one term recognized by each 
automaton of the séquence. 

Theorem 12. The intersection problem for tree automata is EXPTIME-complete. 

Proof. By constructing the product automata for the n automata, and then 
testing non-emptiness, we get an algorithm in 0(||«4i|| x • ■ ■ x ||.A n ||). The proof 
of EXPTIME-hardness is based on simulation of an alternating linear space- 
bounded Turing machine. Roughly speaking, with such a machine and an input 
of length n can be associated polynomially n tree automata whose intersection 
corresponds to the set of accepting computations on the input. It is worth 
noting that the resuit holds for deterministic top down tree automata as well as 
for deterministic bottom-up ones. □ 
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Finiteness 

Instance A tree automaton 

Answer "yes" if and only if the recognized language is finite. 

Theorem 13. Finiteness can be decided in polynomial Urne. 

Proof. Let us consider a NFTA A = (Q,F, Qf, A). Deciding finiteness of A is 

direct by Corollary 1: it suffices to find an accepted terni t s.t.\Q\ < \\t\\ < 2*\Q\. 

A more efficient way to test finiteness is to check the existence of a loop: the 

language is infinité if and only if there is a loop on some useful state, i.e.there 

exist an accessible state q and contexts C and C such that C\q\ — ► q and 

A 

C'[q] — ► q' for some final state q' . Computing accessible and coaccessible states 

can be done in 0(|Q| x ||.4||) or in 0(||«4||) by using an ad hoc représentation 
of the automaton. For a given q, deciding if there is a loop on q can be done in 

0(||.4||). So, finiteness can be decided in 0(\Q\ x \\A\\). □ 

Emptiness of the Complément 

Instance A tree automaton. 

Answer "yes" if and only if every term is accepted by the automaton 

Deciding whether a deterministic tree automaton recognizes the set of ail 
terms is polynomial for a fixed alphabet: we just hâve to check whether the 
automaton is complète (which can be done in 0( | J?-" x \Q\ ty ^ :F ')) and then it 
remains only to check that ail accessible states are final. For nondeterministic 
automata, the following resuit proves in some sensé that determinization with 
its exponential cost is unavoidable: 

Theorem 14. The problem whether a tree automaton accepts the set of ail 
terms is EXPTIME-complete for nondeterministic tree automata. 

Proof. The proof of this theorem is once more based on simulation of a linear 
space bounded alternating Turing machine: indeed, the complément of the ac- 
cepting computations on an input tu can be coded polynomially in a recognizable 
tree language. □ 

Equivalence 

Instance Two tree automata 

Answer "yes" if and only if the automata recognize the saine language. 

Theorem 15. Equivalence is decidable for tree automata. 

Proof. Clearly, as the class of recognizable sets is effectively closed under com- 
plémentation and intersection, and as emptiness is decidable, équivalence is 
decidable. For two deterministic complète automata Ai and A2, we get by 
thèse means an algorithm in 0(||>ti|| x H^H)- (Another way is to compare 
the minimal automata). For nondeterministic ones, this approach leads to an 
exponential algorithm. □ 
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As we hâve proved that deciding whether an automaton recognizes the set 
of ail ground ternis is EXPTIME-hard, we get immediately: 

Corollary 3. The inclusion problem and the équivalence problem for NFTAs 
are EXPTIME-complete. 

Singleton Set Property 

Instance A tree automaton 

Answer "yes" if and only if the recognized language is a singleton set. 

Theorem 16. The singleton set property is decidable in polynomial time. 

Proof. There are several ways to get a polynomial algorithm for this property. 
A first one would be to first check non-emptiness of L(A) and then "extract" 
from A a DFA B whose size is smaller than ||.4|| and which accepts a single terni 
recognized by A. Then it remains to check emptiness of L(A) D L(B). This can 
be done in polynomial time, even if B is non complète. 

Another way is: for each state of a bottom-up tree automaton A, compute, 
up to 2, the number C(g) of ternis leading to state q. This can be done in a 
straightforward way when A is deterministic; when A is non deterministic, this 
can be also done in polynomial time: 

Singleton Set Test Algorithm 
input: NFTA A = (Q, F, Q f , A) 
begin 

Set C{q) to 0, for every g in Q 

/* C{q) G {0, 1, 2} is the number, up to 2, of terms leading to state q * / 

/* if C(q) = 1 then T(q) is a représentation of the accepted tree */ 

repeat 

for each rule f(qi, ■ ■ ■ , q n ) -* (j G A do 

Case AjC(qj) >= 1 and C(q t ) = 2 for some i: Set C(q) to 2 
Case AjC(qj) = 1 and C(q) = 0: Set C{q) to 1, T(q) to f(q u ...q n ) 
Case AjCfe) = 1, C(q) = 1 and Diff(T(q), f(q u . . . , q n )): 

Set C(q) to 2 
Others null 
where Diff(f(q 1 ,...,q n ),g(q' 1 ,...,q' n )) defined by: 

/* Diff can be computed polynomially by using memorization. */ 
if (/ ^ g) then return true 

elseif Dijf(T(qi),T(q' i ) for some ç, then return True 
else return False 
until C can not be changed 
output: 

/*L(A) is empty */ 
if A q £Q f C(q) = then return False 

/* two terms in L(A) accepted in the same state or two différent states */ 
elseif 3q G Qf C(q) = 2 then return False 

elseif 3ç, q' G Q f C{q) = C{q') = 1 and Dtff(T(q),T(q / )) then return False 
/* in ail other cases L(A) is a singleton set*/ 
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else return True. 
end 



□ 



Other complexity results for "classical" problems cari be found in the exer- 
cises. E.g., let us cite the following problem whose proof is sketched in Exer- 
cise 11 



Ground Instance Intersection Problem 
Instance A term t, a tree automaton A. 

Answer "yes" if and only if there is at least a ground instance of t which is 
accepted by A. 

Theorem 17. The Ground Instance Intersection Problem for tree automata 
is P when t is linear, NP-complete when t is non linear and A deterministic, 
EXI 'TIME- complète when t is non linear and A non deterministic. 

1.8 Exercises 

Starred exercises are discussed in the bibliographie notes. 

Exercise 1. Let T = {/(, ),g(),a}. Define a top-down NFTA, a NFTA and a DFTA 
for the set G(t) of ground instances of term t — f(f(a,x),g(y)) which is defined by 
G{t) = {f(f(a,u),g(v)) | u,v g T(F)}. Is it possible to define a top-down DFTA for 
this language? 

Exercise 2. Let T = {/(, ),g(),a}. Define a top-down NFTA, a NFTA and a DFTA 
for the set M(t) of ternis which hâve a ground instance of term t — f(a,g(x)) as a 
subterm, that is M(t) = {C[f(a,g(u))] | C G C(JF), u G T(T)}. Is it possible to define 
a top-down DFTA for this language? 

Exercise 3. Let T = {g(),a}. Is the set of ground ternis whose height is even 
recognizable? Let T — {/(, ),<;(), a}. Is the set of ground ternis whose height is even 
recognizable? 

Exercise 4. Let T = {/(,), a). Prove that the set L = {/(£,£) I t £ T (F)} is 
not recognizable. Let T be any ranked alphabet which contains at least one constant 
symbol a and one binary symbol /(, ). Prove that the set L — {/(£, t) \ t 6 T(JT)} is 
not recognizable. 

Exercise 5. Prove the équivalence between top-down NFTA and NFTA. 

Exercise 6. Let T — {f(,),g(),a} and T' = {/'(, ), g(),a}. Let us consider the 
tree homomorphism h determined by hjr defined by: h^(f) = f (xi, X2), hp{g) — 
f'(xi,xi), and h^(a) = a. Is h(TÇF)) recognizable? Let L\ = {g l {a) \ i > 0}, then 
L\ is a recognizable tree language, is h(Li) recognizable? Let L2 be the recognizable 
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tree language defined by L2 = L(A) where A = (Q,J-,Qf,A) is defined by: Q — 
{qa,Çg,qf}, Qf — {?/}, and A is the following set of transition rules: 

{ 



a 


-» Ça 


ff(9a) -> g 9 


f(q a ,q a ) 


-> g/ 


f (49,19) -» 9/ 


f{qa,q a ) 


-> g/ 


f(q 9 ,qa) -> g/ 


f{qa,qi) 


-h. g/ 


/(?/,?<*) -> qj 


/(?s»Ç/) 


-h. g/ 


/(?/.&) -► 9/ 



f(.Qf,Qf) -* IS }• 
Is h(L-2) recognizable? 

Exercise 7. Let .Fi = {or(, ), and(, ),not(), 0, 1, x}. A ground term over JF can be 
viewed as a boolean formula over variable x. Define a DFTA which recognizes the set 
of satisfiable boolean formulae over x. Let T n = {or(, ), and(, ),not(), 0, 1, xi, . . . , x n }. 
A ground term over T can be viewed as a boolean formula over variables xi, . . . , x n . 
Define a DFTA which recognizes the set of satisfiable boolean formulae over xi, . . . ,x n . 

Exercise 8. Let i be a linear term in T{T, X). Prove that the set G(t) of ground 
instances of term t is recognizable. Let Bbea finite set of linear terms in T(T,X). 
Prove that the set G(R) of ground instances of set R is recognizable. 

Exercise 9. * Let R be a finite set of linear terms in T(T,X). We define the set 
Red(R) of reducible terms for R to be the set of ground terms which hâve a ground 
instance of some term in fi as a subterm. 

1. Prove that the set Red(R) is recognizable. 

2. Prove that the number of states of a DFA recognizing Red(R) can be at least 
2 n_1 where n is the size (number of nodes) of R. Hint: Consider the set reduced 
to the pattern h(f(xi,f(x 2 ,f{x 3 ), . . . , (f(x p - 1 ,f(a,x p ) • • •). 

3. Let us now suppose that R is a finite set of ground terms. Prove that we can 
construct a DFA recognizing Red(R) whose number of states is at most n + 2 
where n is the number of différent subterms of -R. 

Exercise 10. * Let R be a finite set of linear terms in T(T, X) . A term t is inductively 
reducible for R if ail the ground instances of term t are reducible for R. Prove that 
inductive reducibility of a linear term t for a set of linear terms R is decidable. 

Exercise 11. * 

We consider the following décision problem: 

Instance t a term in T(T, X) and A a NFTA 

Answer "y es " if an d only if Yes, iff at least one ground instance of t is accepted by 

\A. 

1. Let us first suppose that t is linear; prove that the property is P. 

Hint: a NFTA for the set of ground instances of t can ce computed polynomially 
(see Exercise 8 

2. Let us now suppose that t is non linear but that A is deterministic. 

(a) Prove that the property is NP. Hint: we just hâve to guess a substitution 
of the variables of t by states. 

(b) Prove that the property is NP-hard. 

Hint: just consider a term t which represents a boolean formula and A a 
DFTA which accepts valid formulas. 



TATA — September 6, 2005 



1.8 Exercises 43 

3. Let us now suppose that t is non linear and that A. is non deterministic. 
Prove that the property is EXPTIME— complète. 
Hint: use the EXPTIME-hardness of intersection non-emptiness. 

Exercise 12. * We consider the following two problems. First, given as instance a 
recognizable tree language L and a tree homomorphism h, is the set h(L) recognizable? 
Second, given as instance a set R of terms in T(T, X), is the set Red(R) recogniz- 
able? Prove that if the first problem is decidable, the second problem is easily shown 
decidable. 

Exercise 13. Let T = {/(, ), a, b}. 

1. Let us consider the set of ground terms L\ defined by the following two condi- 
tions: 

• /(o,6)6Li, 

• teLi =*f(a,f(t,b)) eii. 

Prove that the set L\ is recognizable. 

2. Prove that the set £2 = {£ £ T{T) \ \t\ a = \t\b} is not recognizable where \t\ a 
(respectively \t\b) dénotes the number of a (respectively the number of b) in t. 

3. Let L be a recognizable tree language over T . Let us suppose that / is a 
commutative symbol. Let C(L) be the congruence closure of set L for the set 
of équations C — {f{x,y) = f(y,x)}. Prove that C(L) is recognizable. 

4. Let L be a recognizable tree language over T . Let us suppose that / is a com- 
mutative and associative symbol. Let AC{L) be the congruence closure of set L 
for the set of équations AC = {f(x,y) = f(y,x);f(x,f(y,z)) = f(f(x,y),z)}. 
Prove that in gênerai AC(L) is not recognizable. 

5. Let L be a recognizable tree language over T . Let us suppose that / is an 
associative symbol. Let A(L) be the congruence closure of set L for the set of 
équations A = {f(x, f(y, 2)) = f(f(x, y), z)}. Prove that in gênerai A(L) is not 
recognizable. 

Exercise 14. * Consider the complément problem: 

• Instance A terni t € T{J-, X) and terms t\, . . . , t„, 

• Question There is a ground instance of t which is not an instance of any ti. 

Prove that the complément problem is decidable whenever term t and ail terms ti are 
linear. Extend the proof to handle the case where t is a term (not necessarily linear) . 

Exercise 15. * Let T be a ranked alphabet and suppose that T contains some symbols 
which are commutative and associative. The set of ground AC-instances of a term t is 
the AC-congruence closure of set G(t). Prove that the set of ground AC-instances of a 
linear term is recognizable. The reader should note that the set of ground AC-instances 
of a set of linear terms is not recognizable (see Exercice 13). 

Prove that the AC-complement problem is decidable where the AC-complement 
problem is defined by: 

• Instance A linear term t € T(T, X) and linear terms ti, . . . ,t n , 

• Question There is a ground AC-instance of t which is not an AC-instance of 
any ti. 
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Exercise 16. * Let T be a ranked alphabet and X be a countable set of variables. 
Let S* be a rewrite System on T(J-, X) (the reader is referred to [DJ90]) and L be a 
set of ground ternis. We dénote by S*(L) the set of réductions of terms in L by S and 
by S(L) the set of ground 5-normal forms of set L. Formally, 

S*(L) = {t G T(T) \3ueLu^t}, 

S(L) = {t e r(jf) 1 1 e irr(S) and 3neLuA(} = irr{S) n 5* (L) 

where IRR(S) dénotes the set of ground irreducible terms for S. We consider the two 
following décision problems: 

(lrst order reachability) 

• Instance A rewrite System S, two ground terms u and v, 

• Question v £ S™({u}). 
(2nd order reachability) 

• Instance A rewrite System 5, two recognizable tree languages L and L' , 

• Question S* (L) Ç L' . 

1. Let us suppose that rewrite System S satisfies: 

(PreservRec) If L is recognizable, then S* (L) is recognizable. 

What can be said about the two reachability décision problems? Give a suffi- 
cient condition on rewrite System S satisfying (PreservRec) such that S satisfies 
(NormalFormRec) where (NormalFormRec) is defined by: 

(NormalFormRec) If L is recognizable, then S(L) is recognizable. 

2. Let T ={/(,), s(),/i(), a}. Let L = {/(ti,t 2 ) | h,h & T({gQ, h{), a}}, and S 
is the following set of rewrite rules: 

{ f(g(x),h(y)) -» /(*,V) f(h(x),g(y)) -» /(a, y) 

g(h(x)) -> a: Kg( x )) -* ^ 

/(a, a:) -> a; /(a, a) -^ a; } 

Are the sets L, S*(L), and S(L) recognizable? 

3. Let J 7 = {/(, ),g(), /i(), a}. Let L = {ç/(/i"(a)) | n > 0}, and S 1 is the following 
set of rewrite rules: 

{ S(a0 -» /(i,i) } 
Are the sets L, S*(L), and S(L) recognizable? 

4. Let us suppose now that rewrite System S is linear and monadic, i.e.all rewrite 
rules are of one of the following three types: 



(1) 


l — > a 




, a 6 To 


(2) 


l — * X 




,x € Var(Z) 


(3) 


1 -> /(*!)- 


■ , a; p j 


, a5i, .. .,x p € Var(l),f e J> 



where / is a linear term (no variable occurs more than once in t) whose height 
is greater than 1. Prove that a linear and monadic rewrite System satisfies 
(PreservRec) . Prove that (PreservRec) is false if the right-hand side of rules of 
type (3) may be non linear. 

Exercise 17. Design a linear-time algorithm for testing emptiness of the language 
recognized by a tree automaton: 

Instance A tree automaton 
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Answer "yes" if and only if the language recognized is empty. 

Hint: Choose a suitable data structure for the automaton. For example, a state 
could be associated with the list of the "adresses" of the rules whose left-hand side 
contain it (eventually, a rule can be repeated); each rule could be just represented by 
a counter initialized at the arity of the corresponding symbol and by the state of the 
right-hand side. Activating a state will décrément the counters of the corresponding 
rules. When the counter of a rule becomes null, the rule can be applied: the right-hand 
side state can be activated. 

Exercise 18. 

The Solvable Path Problem is the following: 

Instance a finite set X and three sets R C X x X x X, X s C X and X t C X. 

Answer "yes" if and only if X t D A is non empty, where A is the least subset of X 
such that X s C A and if y, z £ A and (x, y, z) £ R, then x £ A. 

Prove that this P — complète problem is log-space reducible to the emptiness 
problem for tree automata. 

Exercise 19. A flat automaton is a tree automaton which has the following property: 
there is an ordering > on the states and a particular state çy such that the transition 
rules hâve one of the following forms: 

1- /(?t, ■■-,?-!-) -» <7t 

2. /(ci , . . . , q n ) — * q with q > q,: for every i 

3- /(çt,...,<7t,9,<7t, ■ ■ ■ ,çt) -> q 

Moreover, we assume that ail terms are accepted in the state qt ■ (The automaton is 
called flat because there are no "nested loop" ) . 

Prove that the intersection of two flat automata is a finite union of automata whose 
size is linear in the sum of the original automata. (This contrasts with the construction 
of Theorem 5 in which the intersection automaton's size is the product of the sizes of 
its components). 

Deduce from the above resuit that the intersection non-emptiness problem for flat 
automata is in NP (compare with Theorem 12). 

1.9 Bibliographie Notes 

Tree automata were introduced by Doner [Don65, Don70] and Thatcher and 
Wright [TW65, TW68]. Their goal was to prove the decidability of the weak 
second order theory of multiple successors. The original définitions are based 
on the algebraic approach and involve heavy use of universal algebra and/or 
category theory. 

Many of the basic results presented in this chapter are the straightforward 
generalization of the corresponding results for finite automata. It is difficult to 
attribute a particular resuit to any one paper. Thus, we only give a list of some 
important contributions consisting of the above mentioned papers of Doner, 
Thatcher and Wright and also Eilenberg and Wright [EW67], Thatcher [Tha70], 
Brainerd [Bra68, Bra69], Arbib and Give'on [AG68]. Ail the results of this 
chapter and a more complète and detailed list of références can be found in the 
textbook of Gécseg and Steinby [GS84] and also in their récent survey [GS96]. 
For an overview of the notion of recognizability in gênerai algebraic structures 
see Courcelle [Cou89] and the fundamental paper of Mezei and Wright [MW67]. 
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In Nivat and Podelski [NP89] and [Pod92], the theory of recognizable tree lan- 
guages is reduced to the theory of recognizable sets in an infinitely generated 
free monoid. 

The results of Sections 1.1, 1.2, and 1.3 were noted in many of the papers 
mentioned above, but, in this textbook, we présent thèse results in the style of 
the undergraduate textbook on finite automata by Hopcroft and Ulhnan [HU79] . 
Tree homomorphisms were defined as a spécial case of tree transducers, see 
Thatcher [Tha73] . The reader is referred to the bibliographie notes in Chapter 6 
of the présent textbook for detailed références. The reader should note that our 
proof of préservation of recognizability by tree homomorphisms and inverse tree 
homomorphisms is a direct construction using FTA. A more classical proof can 
be found in [GS84] and uses regular tree grammars (see Chapter 2). 

Minimal tree recognizers and Nerode's congruence appear in Brainerd [Bra68, 
Bra69], Arbib and Give'on [AG68], and Eilenberg and Wright [EW67]. The 
proof we presented hère is by Kozen [Koz92] (see also Fiilôp and Vâgvôlgyi [FV89] ) . 
Top-down tree automata were first defined by Rabin [Rab69]. The reader is 
referred to [GS84] and [GS96] for more références and for the study of some 
subclasses of recognizable tree languages such as the tree languages recognized 
by deterministic top-down tree automata. An alternative définition of determin- 
istic top-down tree automata was defined in [NP97] leading to "homogeneous" 
tree languages, also a minimization algorithm was given. 

Some results of Sections 1.7 are "folklore" results. Complexity results for 
the membership problem and the uniform membership problem could be found 
in [LohOl]. Other interesting complexity results for tree automata can be found 
in Seidl [Sei89], [Sei90]. The EXPTIME-hardness of the problem of intersec- 
tion non-emptiness is often used; this problem is close to problems of type 
inference and an idea of the proof can be found in [FSVY91]. A proof for de- 
terministic top-down automata can be found in [Sei94b]. A detailed proof in 
the deterministic bottom-up case as well as some other complexity results are 
in [Vea97a], [Vea97b]. 

We hâve only considered finite ordered ranked trees. Unranked trees are 
used for XML Document Type Définitions and more generally for XML schéma 
languages [MLM01]. The theory of unranked trees dates back to Thatcher. Ail 
the fundamental results for finite tree automata can be extended to the case of 
unranked trees and the methods are similar [BKMW01]. An other extension is 
to consider unordered trees. A gênerai discussion about unordered and unranked 
trees can be found in the bibliographical notes of Section 4. 

Numerous exercises of the présent chapter illustrate applications of tree au- 
tomata theory to automated déduction and to the theory of rewriting Systems. 
Thèse applications are studied in more détails in Section 3.4. Results about tree 
automata and rewrite Systems are collected in Gilleron and Tison [GT95]. Let 
S be a term rewrite System (see for example Dershowitz and Jouannaud [DJ90] 
for a survey on rewrite Systems), if S is left-linear the set IRR(S) of irreducible 
ground terms w.r.t.S is a recognizable tree language. This resuit first appears 
in Gallier and Book [GB85] and is the subject of Exercise 9. However not every 
recognizable tree language is the set of irreducible terms w.r.t.a, rewrite System 
S (see Fùlôp and Vâgvôlgyi [FV88]). It was proved that the problem whether, 
given a rewrite System S as instance, the set of irreducible terms is recognizable 
is decidable (Kucherov [Kuc91]). The problem of préservation of regularity by 
tree homomorphisms is not known decidable. Exercise 12 shows connections 
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between préservation of regularity for tree homomorphisms and recognizability 
of sets of irreducible ternis for rewrite Systems. 

The notion of inductive reducibility (or ground reducibility) was introduced 
in automated déduction. A term t is S-inductively (or .S-ground) reducible for 
S if ail the ground instances of term t are reducible for S. Inductive reducibility 
is decidable for a linear term t and a left-linear rewrite System S. This is 
Exercise 10, see also Section 3.4.2. Inductive reducibility is decidable for finite S 
(see Plaisted [Pla85]). Complément problems are also introduced in automated 
déduction. They are the subject of Exercises 14 and 15. The complément 
problem for linear terms was proved decidable by Lassez and Marriott [LM87] 
and the AC-complement problem by Lugiez and Moysset [LM94]. 

The reachability problem is defined in Exercise 16. It is well known that this 
problem is undecidable in gênerai. It is decidable for rewrite Systems preserving 
recognizability, i.e.such that for every recognizable tree language L, the set 
of réductions of terms in L by S 1 is recognizable. This is true for linear and 
monadic rewrite Systems (right-hand sides hâve depth less than 1). This resuit 
was obtained by K. Salomaa [Sal88] and is the matter of Exercise 16. This is 
true also for linear and semi-monadic (variables in the right-hand sides hâve 
depth at most 1) rewrite Systems, Coquidé et al. [CDGV94]. Other interesting 
results can be found in [Jac96] and [NT99] . 
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Chapter 2 

Regular Grammars and 
Regular Expressions 

2.1 Tree Grammar 

In the previous chapter, we hâve studied tree languages from the accepter point 
of view, using tree automata and defining recognizable languages. In this chap- 
ter we study languages from the génération point of view, using regular tree 
grammars and defining regular tree languages. We shall see that the two no- 
tions are équivalent and that many properties and concepts on regular word 
languages smoothly generalize to regular tree languages, and that algebraic 
characterization of regular languages do exist for tree languages. Actually, this 
is not surprising since tree languages can be seen as word languages on an infi- 
nité alphabet of contexts. We shall show also that the set of dérivation trees of 
a context-free language is a regular tree language. 

2.1.1 Définitions 

When we write programs, we often hâve to know how to produce the éléments of 
the data structures that we use. For instance, a définition of the lists of integers 
in a functional language like ML is similar to the following définition: 

Nat = | s(Nat) 

List = nil | cons(N at, List) 

This définition is nothing but a tree grammar in disguise, more precisely the 
set of lists of integers is the tree language generated by the grammar with axiom 
List, non-terminal symbols List, Nat, terminal symbols 0, s, nil, cons and rules 



at -> 




'at -> s(Nat) 




ist — > nil 




ist — > cons(Nat, 


List) 



Tree grammars are similar to word grammars except that basic objects are 
trees, therefore terminais and non-terminals may hâve an arity greater than 0. 
More precisely, a tree grammar G = (S, N, T , R) is composed of an axiom 
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S, a set N of non-terminal symbols with S G N, a set T of terminal 
symbols, a set R of production rules of the form a — ► /3 where a, /3 are trees 
of T(JTuiVU A') where A is a set of dummy variables and a contains at least one 
non-terminal. Moreover we require that FDN = 0, that each élément of NuT 
has a fixed arity and that the arity of the axiom S is 0. In this chapter, we 
shall concentrate on regular tree grammars where a regular tree grammar 
G = (S, N, J- ', R) is a tree grammar such that ail non-terminal symbols hâve 
arity and production rules hâve the form A — ► (3, with A a non-terminal of N 
and/3 a tree oîT(TUN). 

Example 17. The grammar G with axiom List, non-terminals List,Nat 
terminais 0, nil, s(), cons(, ), rules 

List — y nil 

List —y cons(Nat, List) 

Nat^O 

Nat -» s(Nat) 

is a regular tree grammar. 

A tree grammar is used to build ternis from the axiom, using the corre- 
sponding dérivation relation. Basically the idea is to replace a non-terminal 
A by the right-hand side a of a rule A — y a. More precisely, given a regu- 
lar tree grammar G = (S, N, T, R), the dérivation relation — y g associated to 
G is a relation on pairs of ternis of T(T U N) such that s — >g t if and only 
if there are a rule A —y a G R and a context C such that s = C[A] and 
t = C[a]. The language generated by G, denoted by L(G), is the set of 
terms of T(!F) which can be reached by successive dérivations starting from the 

axiom, i.e.L(G) = {s G Tjr | S — >g s} with — > the transitive closure of — >q. 
We write — > instead of — >g when the grammar G is clear from the context. A 
regular tree language is a language generated by a regular tree grammar. 

Example 18. Let G be the grammar of the previous example, then a dérivation 
of cons(s(0), nil) from List is 

List —yQ cons(Nat, List) — yQ cons(s(Nat), List) — >g cons(s(Nat), nil) 

— yQ cons(s(0), nil) 

and the language generated by G is the set of lists of non-negative integers. 

From the example, we can see that trees are generated top-down by replacing 
a leaf by some other term. When A is a non-terminal of a regular tree grammar 
G, we dénote by Lq(A) the language generated by the grammar G' identical to 
G but with A as axiom. When there is no ambiguity on the grammar referred to, 
we drop the subscript G. We say that two grammars G and G' are équivalent 
when they generate the same language. Grammars can contain useless rules or 
non-terminals and we want to get rid of thèse while preserving the generated 
language. A non-terminal is reachable if there is a dérivation from the axiom 
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containing this non-terminal. A non-terminal A is productive if Lq(A) is non- 
empty. A regular tree grammar is reduced if and only if ail its non-terminals 
are reachable and productive. We hâve the following resuit: 

Proposition 2. A regular tree grammar is équivalent to a reduced regular tree 
grammar. 

Proof. Given a grammar G = (S, N, J 7 , R), we can compute the set of reach- 
able non-terminals and the set of productive non-terminals using the séquences 
(Reach) n and (Prod) n which are defined in the following way. 



fl =' 


u 




"n — 


Prod n _i 

U 






{AeN 


3(A- 


ho = 


{S} 




fl n — 


- Reach n . 

U 


-î 




{AeN 


3(A' 



a) e R s.i.each non-terminal of a is in Prod n -\\ 



a) e R s. t. A' e Reach n -i and A occurs in a} 

For each séquence, there is an index such that ail éléments of the séquence 
with greater index are identical and this élément is the set of productive (resp. 
reachable) non-terminals of G. Each regular tree grammar is équivalent to a 
reduced tree grammar which is computed by the following cleaning algorithm. 

Computation of an équivalent reduced grammar 
input: a regular tree grammar G = (S, N, J- ', R). 

1. Compute the set of productive non-terminals Np ro( i = U n>0 Prod n for G 
and let G" = (S, Np ro d, T, R') where R' is the subset of R involving rules 
containing only productive non-terminals. 

2. Compute the set of reachable non-terminals Nn eac h = U n >o R ea ch n for 
G" (not G) and let G" = (S,N Reach ,T,R") where R" is thë subset of R' 
involving rules containing only reachable non-terminals. 

output: G" 

The équivalence of G, G' and G" is left to the reader. Moreover each non- 
terminal A of G" must appear in a dérivation S— >g" C[A] —>g" C[s] which 
proves that G" is reduced. The reader should notice that exchanging the two 
steps of the computation may resuit in a grammar which is not reduced (see 
Exercise 22). 

D 

Actually, we shall use even simpler grammars, i.e.normalized regular tree 
grammar, where the production rules hâve the form A — > f(Ai,...,A n ) or 
A — y a where /, a are symbols of T and A, Ai, ... , A n are non-terminals. The 
following resuit shows that this is not a restriction. 

Proposition 3. A regular tree grammar is équivalent to a normalized regular 
tree grammar. 
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Proof. Replace a rule A — > /(si, . . . , s„) by A — > /(.Ai, . . . , A„) with A^ = s^ if 
s; £ iV otherwise Ai is a new non-terminal. In the last case add the rule Ai — » Si. 
Iterate this process until one gets a (necessarily équivalent) grammar with rules 
of the form A — » /(Ai, . . . , A„) or A — y a or Ai — > A2. The last rules are 
replaced by the rules A\ — > a for ail a N such that Ai — > Ai and Ai — ► a G iî 
(thèse A^s are easily computed using a transitive closure algorithm). D 

From now on, we assume that ail grammars are normalized, unless this is 
stated otherwise explicitly. 

2.1.2 Regularity and Recognizabilty 

Given some normalized regular tree grammar G = (S, N, J 7 , Rg)i w e show how 
to build a top-down tree automaton which recognizes L(G). We define A = 
(Q,F,I, A) by 

• Q = {q A \AeN} 

• / = tis} 

• q A {f{xi,.. . ,x„)) -> /(^ x (xi),...,ÇA„(a;n)) £ A if and only if A -> 
/(A 1; ...,A„) eiî G . 

A standard proof by induction on dérivation length yields L(G) = L(A). There- 
fore we hâve proved that the languages generated by regular tree grammar are 
recognizable languages. 

The next question to ask is whether recognizable tree languages can be 
generated by regular tree grammars. If L is a regular tree language, there 
exists a top-down tree automata A = {Q,T,I,lS) such that L = L(A). We 
define G = {S,N,T,Rg) with S a new symbol, N = {A q \ q G Q}, Rq = 
{A q ^ f(A qi ,...,A q J | q(f( Xl ,...,x n )) -^ f(qi(xi),...,q n (xn)) e R}l){S -> 
Aj I Aj S J}. A standard proof by induction on dérivation length yields L(G) = 
L{A). 

Combining thèse two properties, we get the équivalence between recogniz- 
ability and regularity. 

Theorem 18. A tree language is recognizable if and only if it is a regular tree 
language. 

2.2 Regular Expressions. Kleene's Theorem for 
Tree Languages 

Going back to our example of lists of non- négative integers, we can write the 
sets defined by the non-terminals Nat and List as follows. 

Nat = {0,s(0),s(s(0)),...} 

List = {niZ, cons(-, nil), cons(-, cons(-, nil), . . .} 

where _ stands for any élément of Nat. There is some regularity in each set 
which reminds of the regularity obtained with regular word expressions con- 
structed with the union, concaténation and itération operators. Therefore we 
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can try to use the saine idea to dénote the sets Nat and List. However, since we 
are dealing with trees and not words, we must put some information to indicate 
where concaténation and itération must take place. This is done by using a 
new symbol which behaves as a constant. Moreover, since we hâve two indepen- 
dent itérations, the first one for Nat and the second one for List, we shall use 
two différent new symbols di and 2 and a natural extension of regular word 
expression leads us to dénote the sets Nat and List as follows. 

Nat = s(D 1 )*' D > . Dl 

List = nil + cons( («(Di)*- ! . Dl 0) , □ 2 )*^ 2 -n 2 nil 

Actually the first term nil in the second equality is redundant and a shorter 
(but slightly less natural) expression yields the same language. 

We are going to show that this is a gênerai phenomenon and that we can 
define a notion of regular expressions for trees and that Kleene's theorem for 
words can be generalized to trees. Like in the example, we must introduce a 
particular set of constants K, which are used to indicate the positions where 
concaténation and itération take place in trees. This explains why the syntax 
of regular tree expressions is more cumbersome than the syntax of word regular 
expressions. Thèse new constants are usually denoted by D 1; D 2 , . . .. Therefore, 
in this section, we consider trees constructed on FUlC where /C is a distinguished 
finite set of symbols of arity disjoint from T. 

2.2.1 Substitution and Itération 

First, we hâve to generalize the notion of substitution to languages, replacing 
some Dj by a tree of some language Li. The main différence with term sub- 
stitution is that différent occurrences of the same constant Dj can be replaced 
by différent terms of Li. Given a tree t of T(T U fC), □!,...,□„ symbols of K. 
and L\, . . . , L n languages of T(J-'UlC), the tree substitution (substitution for 
short) of Di, ...,□„ by L\, . . . , L n in t, denoted by £{Di«— L\, . . . , D n ^L n }, is 
the tree language defined by the following identifies. 

• O i {a 1 ^L ïl . . . , a n ^L n } = Li for i = 1, . . . ,n, 

• a{Oi^- L\, . . . , □„<— L„} = {a} for ail a G T U K, such that arity of a is 
and a/Di,...,a/ □„, 

• /(si,...,s„){Di*- Li,...,D n <-L n } = {f(ti,...,tn) \U e Si{ Di<-Li 

a n ^L n }} 

Example 19. Let T = {0, nil, s(), cons(, )} and K. = {□]., CI2}, let 

t = cons(D 1 , cons(0 1 , D 2 )) 



and let 



L 1 = {0,s(0)} 
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then 

t{D 1 ^L} = {cons(0,cons(0,a 2 )) : 

cons(0, cons(s(0), 2 )), 

cons(s(0), cons(0, 2 )), 

cons(s(0), cons(s(0), 2 ))} 

Symbols of K are mainly used to distinguish places where the substitution 
must take place, and they are usually not relevant. For instance, if t is a tree 
on the alphabet JFU {□} and L be a language of trees on the alphabet T , then 
the trees of £{□ <— L} don't contain the symbol □. 

The substitution opération generalizes to languages in a straightforward way. 
When L,L±, . . . ,L n are languages of T(T U K.) and D 1: . . . , □„ are éléments of 
/C, we define L{D 1 <— L\, . . . , □„<— L n } to be the set IJtgLi *{'-'i <— ^ij • ■ ■ ; ^n* - 

in}}- 

Now, we can define the concaténation opération for tree languages. Given L 
and M two languages of ÏVuk;, and D be a élément of K, the concaténation 
of M to L through D, denoted by L .□ A/, is the set of trees obtained by 
substituting the occurrence of □ in trees of L by trees of M, i.e.L .□ M = 

U tei Wn^A/}}. 

To define the closure of a language, we must define the séquence of successive 
itérations. Given L a language of T(J-'UlC) and □ an élément of /C, the séquence 
L n ' D is defined by the equalities. 

. L°- D = {D} 

• L n+1 ' a = L n < D UI. D L n ' D 



The closure L*- a of L is the union of ail L n ' a for non-negative n, Le., L* 
U n >oi n,n . From the définition, one gets that {D} C L*' D for any L. 



Example 20. Let T = {0,nil,s(),cons(,)}, let L = {0, cons(0, □)} and 
M = {mi, cons(s(0), □)}, then 

L .n M = {0, cons(0, ni/), cons(0, cons(s(0), □))} 
L*< n = {D}U 

{0,cons(0,D)}U 

{0, cons(0, □), cons(0, cons(0, □))} U . . . 

We prove now that the substitution and concaténation opérations yield reg- 
ular languages when they are applied to regular languages. 

Proposition 4. Let L be a regular tree language on T U K,, let L\, . . . , L n be 
regular tree languages on JFU K, let Di, . . . , □„ G K,, then L{Oi<— L\, . . . , □„<— 
Ln} is a regular tree language. 

Proof. Since L is regular, there exists some normalized regular tree grammar 
G = (S,N,F U /C, R) such that L = L(G), and for each i = l,...,n there 
exists a normalized grammar Gi = (S'i,7Vj,^ r U JC, Ri) such that Li = L(Gi). 
We can assume that the sets of non-terminals are pairwise disjoint. The idea 
of the proof is to construct a grammar G" which starts by generating trees like 
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G but replaces the génération of a symbol D i by the génération of a tree of 
Li via a branching towards the axiom of Gi. More precisely, we show that 
L{n 1 <-L 1 ,...,D„<-£ n } =L{G') whereG" = (S, N', T U K,R') such that 

• iV' = JV U i\Ti U . . . U JV„, 

• iî' contains the rules of Ri and the rules of R but the rules A — y Dj which 
are replaced by the rules A — > Sj, where Si is the axiom of Li. 



A^D 



Si^s 




A^ □, 



S,^s' 



Figure 2.1: Replacement of rules A — > Dj 

A straightforward induction on the height of trees proves that G' générâtes 
each tree of T{Di<— L±, . . . , □„<— L n }. 

The converse is to prove that L(G') C LjDx-s— L\, . . . , O n <— L n }. This is 
achieved by proving the following property by induction on the dérivation length. 

A — > s' where s' G T(T U /C) using the rules of G" 

if and only if 

there is some s such that A — > s using the rules of G and 



*'€*{□!<-.&!,. 



□, 



-in}. 



• base case: A — > s in one step. Therefore this dérivation is a dérivation of 
the grammar G and no Dj occurs in s, yielding s G L{Di<— Li, . . . , □„<— 

L n } 

• induction step: we assume that the property is true for any terminal and 
dérivation of length less than n. Let A be such that A — y s' in n steps. 
This dérivation can be decomposed as A — > s± — > s' . We distinguish several 
cases depending on the rule used in the dérivation A — » si. 



the rule is A — » f(A%, . 
L(Ai){ni<-ii,...,n f 

^nji 



, A m ), therefore s' = /(ti, . . . , i TO ) and t % G 
L„}, therefore s' G L^"^*— Li, . . . , □„<- 
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— the rule is A ^ Si, therefore A — > O i G R and s' G Li and s' G 

L(A){n 1 <-L u ...,n n *-L n }. 

— the rule A — > a with a € J, o of arity 0, a/ □]., ...,a/ □„ are not 
considered since no further dérivation can be done. 

D 

The following proposition states that regular languages are stable also under 
closure. 

Proposition 5. Let L be a regular tree language ofT(T\J K,), let □ G /C, then 
L* ,n is a regular tree language ofT(T\JK,). 

Proof. There exists a normalized regular grammar G = (S,N,J-'U IC,R) such 
that L = L{G) and we obtain from G a grammar G" = {S', NU {S'}, FUtC, R') 
for L* ,a by replacing rules leading to □ such as A — > □ by rules A — ^ S' leading 
to the (new) axiom. Moreover we add the rule S' — > □ to generate {□} = L°'° 
and the rule S' — ► S to generate L l - D for i > 0. By construction G" générâtes 
the éléments of L*' D . 

Conversely a proof by induction on the length on the dérivation proves that 
L{G')CL*' n . D 

2.2.2 Regular Expressions and Regular Tree Languages 

Now, we can define regular tree expression in the flavor of regular word expres- 
sion using the +, .n,*' D operators. 

Définition 2. The set Regexp(T,IC) of regular tree expressions on T and 

JC is the smallest set such that: 

• the empty set is in Regexp{T,K.) 

• if a G Tq U /C is a constant, then a G Regexp(J- ,K.), 

• if f G T n has arity n > and E\ , . . . , E n are regular expressions of 
Regexp(J- ', /C) then f{E\, . . . , E n ) is a regular expression of Regexp{T , K,), 

• if E\,E2 are regular expressions of Regexp(T,1C) then (E\ + E2) is a 
regular expression of Regexp{T,K,) ; 

• if ' E\,E2 are regular expressions of Regexp(T ', /C) and □ is an élément of 
K, then E\ . D E2 is a regular expression of Regexp(T,K,), 

• if E is a regular expression of Regexp(T ,K.) and D is an élément of K, 
then E*' a is a regular expression of Regexp{T ,ÏC) . 

Each regular expression E represents a set of terms of T{T U K) which we 
dénote \E\ and which is formally defined by the following equalities. 

• M = 0, 

. [/(£!, . . . , E n )\ = {f( Sl , ..., Sn )\ Sl e [£1], . . . , s n G [£„]}, 
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. [E^ E 2 \ = lE 1 \{n<-[E 2 ]}, 



IE*< D } = [Ej 



Example 21. Let T = {0, ni/, s(), cons(, )} and □ G /C then 

(cons(0, □)*' ).anil 

is a regular expression of Regexp{T ', /C) which dénotes the set of lists of zéros: 

{ni/, cons(0, ni/), cons(0, cons(0, nil)), . . .} 



In the remaining of this section, we compare the relative expressive power 
of regular expressions and regular languages. It is easy to prove that for each 
regular expression E, the set [E 1 ] is a regular tree language. The proof is done 
by structural induction on E. The first three cases are obvious and the two last 
cases are conséquences of Propositions 5 and 4. The converse, i.e.a regular tree 
language can be denoted by a regular expression, is more involved and the proof 
is similar to the proof of Kleene's theorem for word language. Let us state the 
resuit first. 

Proposition 6. Let A = (Q, T, Qf, A) be a bottom-up tree automaton, then 
there exists a regular expression E of Regexp(J- ', Q) such that L(A) = \E\. 

The occurrence of symbols of Q in the regular expression denoting L(A) 
doesn't cause any trouble since a regular expression of Regexp{T ', Q) can dénote 
a language of Tyr. 

Proof. The proof is similar to the proof for word languages and word automata. 
For each 1 < i,j, < \Q\,K Ç Q, we define the set T(i,j 7 K) as the set of trees 
t of T(J-" U K) such that there is a run r of A on t satisfying the following 
properties: 

• r(e) = q t , 

• r(p) e {gi, . . . , Qj} for ail p ^ e labelled by a function symbol. 

Roughly speaking, a term is in T(i,j,K) if we can reach qi at the root by 
using only states in {ci, . . . , qj} when we assume that the leaves are states of K. 
By définition, L{A) the language accepted by A is the union of the T(i, \Q\, 0)'s 
for i such that qi is a final state: thèse terms are the ternis of T{T) such 
that there is a successful run using any possible state of Q. Now, we prove 
by induction on j that T{i,j, K) can be denoted by a regular expression of 
Regexp{T ', Q). 

• Base case j = 0. The set T(i, 0, K) is the set of trees t where the root is 
labelled by qi, the leaves are in T U K and no internai node is labelled 
by some q. Therefore there exist ai, . . . , a„, a G T U K such that t = 
/(ai, . . . , a n ) or t = a, hence T(i, 0, K) is finite and can be denoted by a 
regular expression of Regexp(T U Q). 
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• Induction case. Let us assume that for any i', K' Ç Q and < j' < j, the 
set T(i',j', K') can be denoted by a regular expression. We can write the 
following equality: 



T(i : j,K)= T(i,j-1,K) 
U 
T{i,j - 1, K U {g,}) .<fe T(j, J - 1, K U fe}) ••* .q 3 T(j,j - 1, if) 

The inclusion of T(i,j,K) in the right-hand side of the equality can be 
easily seen from Figure 2.2.2. 




T(j,j-1,K) 



>, 



Figure 2.2: Décomposition of a term of T(i,j, K) 



The converse inclusion is also not difficult. By définition: 

T(i,j-l,K)ÇT(i,j,K) 

and an easy proof by induction on the number of occurrences of qj yields: 

T(i,j - l,KU{q 3 }) .q, T(j,j ~ hKU{q 3 }) *'«* .q, T(j,j - 1, K) Ç T(i,j,K) 

By induction hypothesis, each set of the right-hand side of the equality 
defining T(i, j, K) can be denoted by a regular expression of Regex(J-'UQ). 
This yields the desired resuit because the union of thèse sets is represented 
by the sum of the corresponding expressions. 



□ 
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Since we hâve already seen that regular expressions dénote recognizable tree 
languages and that recognizable languages are regular, we can state Kleene's 
theorem for tree languages. 

Theorem 19. A tree language is recognizable if and only if it can be denoted 
by a regular tree expression. 

2.3 Regular Equations 

Looking at our example of the set of lists of non- négative integers, we can 
realize that thèse lists can be defined by équations instead of grammar rules. 
For instance, denoting set union by +, we could replace the grammar given in 
Section 2.1.1 by the following équations. 

Nat = + s(Nat) 

List = nil + cons(Nat, List) 

where the variables are List and Nat. To get the usual lists of non-negative 
numbers, we must restrict ourselves to the least fixed-point solution of this set 
of équations. Systems of language équations do not always hâve solution nor 
does a least solution always exists. Therefore we shah study regular équation 
Systems defined as follows. 

Définition 3. Let X\, . . . , X„ be variables denoting sets of trees, for 1 < j < p, 
1 < i < mj,let s? 's be terms over T U {X±, . . . , X n }, then a regular équation 
System S is a set of équations of the form: 

Xi = sl + .-. + s 1 ^ 

X p = Si + . . . + s mp 
A solution of S is any n-tuple (L±, . . . , L n ) of languages of T(T) such that 

Li = s\{X 1 ^L 1 ,...,X n ^L n }U...Us 1 mi {X 1 ^L 1 ,...,X n ^L n } 

L p = sf{Xi<— I/i, . . . ,X n <—L n } U . . . U Sm p {Xi<—Li, . . . ,X n <—L n } 

Since équations with the same left-hand side can be merged into one équa- 
tion, and since we can add équations Xk = Xk without changing the set of 
solutions of a System, we assume in the following that p = n. 

The ordering Ç is defined on T(T) n by 

(Li, . . . , L n ) Ç (Li, . . . , L' n ) mU Ç L'i for ail i = 1, . . . , n 

By définition (0, . . . , 0) is the smallest élément of Ç and each increasing 
séquence has an upper bound. To a System of équations, we associate the fixed- 
point operator TS : T{T) n -> T{T) n defined by: 

TS(L u ...,L n )^(L' 1 ,...,L' n ) 
where 
L[ = L 1 Us 1 1 {X l ^L u ...,X n ^L n }U...Us 1 mi {X 1 ^L 1 ,...,X n ^L„} 

L' n — L n U sî{Xi-(— Li, . . . ,X n ^L n } U . . . U Sm„{Xi<—Li, . . . ,X n <— L n } 
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Example 22. Let 5 be 

Nat = + s(Nat) 

List = nil + cons(Nat, List) 

then 

rs(M) = ({o},{m*}) 

T<S 2 (0,0) = ({Q,s(0)},{nil,œns(0,nil)}) 

Using a classical approach we use the fixed-point operator to compute the 
least fixed-point solution of a System of équations. 

Proposition 7. The fixed-point operator TS is continuons and its least fixed- 
point TiS"(0, . . • , 0) is the least solution of S. 

Proof. We show that TS is continuous in order to use Knaster-Tarski's theorem 
on continuous operators. By construction, TS is monotonous, and the last 
point is to prove that if Si Ç S 2 Ç . . . is an increasing séquence of n-tuples of 
languages, the equality TS([J i>1 Si) = (J-j>i 1~S(Si)) holds. By définition, each 
Si can be written as (SI, . . . , S„). 

• We hâve that U»=i TS(Si) Ç TS((j i=1 (Si) holds since the séquence 
Si Ç S2 Ç . . . is increasing and the operator TS is monotonous. 

• Conversely we must prove TS((j i=1 Si) Ç (j i=1 TS(Si)). 

Let v = (v , ...,v n ) G TS((J i=1 Si). Then for each k = 1, ...,n 
either v k G U*=i Si hence v k G Si k for some lh, or there is some 
u = (m 1 , . . . ,u n ) G \Ji>iSi such that v k = s k Jk {Xi <— u 1 , . . . ,X n t- m™}. 
Since the séquence (S^ i>i) is increasing we hâve that u G Si k for some /&. 
Therefore v k G T5(5 L ) Ç TS({J t=h Si) for L = max{l k \ k = 1, . . . ,n}. 

D 

We hâve introduced Systems of regular équations to get an algebraic charac- 
terization of regular tree languages stated in the following theorem. 

Theorem 20. The least fixed-point solution of a System of regular équations 
is a tuple of regular tree languages. Conversely each regular tree language is a 
component of the least solution of a System of regular équations. 

Proof. Let S 1 be a System of regular équations, and let Gi = (Xi, {X\, . . . , X n }, T , R) 

where R = Uj—i n {Xk — ► s\,...,Xk — ► s J k k } if the k th équation of S is 

Xk = s\. + . . . + s°u ■ We show that L(Gi) is the i th component of (L\, . . . , L n ) 
the least fixed-point solution of S. 

• We prove that TS P (%, . . . , 0) Ç (L(Gi), ..., L(G n )) by induction on p. 

Let us assume that this property holds for ail p' < p. Let u = (ui, . . . , u n ) 
be an élément of TS rp+1 (0, . . . , 0) = TS(TS P (%, ..., 0)). For each i in 
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1, . . . , n, either u l G T5 P (0, . . . , 0) and u, G L(Gi) by induction hypoth- 
esis, or there exist w 1 = (dJ, . . . , v l n ) G TS P {%, . . . , 0) and s^ such that 
«i = s^{Xi — > v\, . . . , X n — > v l n }. By induction hypothesis vl G L(Gj) for 
j = 1, . . . , n therefore Uj G L(Gi). 

• We prove now that (L(X X ), . . . , L(X n )) C TS w (0, . . . , 0) by induction on 
dérivation length. 

Let us assume that for each i = 1, . . . , ri, for each p' < p, if X^ — > p u^ then 



^/' 



<V 



Ui G T5 p '(0, ... ,0). Let Xi ->p +1 Ui , then X, -> s|(Xi, ..., Jr„ 

with Mj = s^(z;i, . . . , w„) and .Xj — > p Vj for some p' < p. By induction 

hypothesis u,- G TS P ' (0, . . . , 0) which yields that u t G T«S p+1 (0, . . . , 0). 

Conversely, given a regular grammar G = {S, {Ai , . . . , A n }, T , R), with /? = 

s™ }, a similar proof yields 



{Ai — ► Sj, . . . , Ai — > s pi , . . . , A n — > Sj , . . . 
that the least solution of the System 


A — > 

i -fin ' 


A 1 = s\ + . 


■ • + ^ 


A n = Si + . 


■ ■ + S P„ 



is(L(A!),...,L(A n )). D 



Example 23. The grammar with axiom List, non-terminals List, Nat termi- 
nais 0, s(), nil, cons(, ) and rules 



ist — > mZ 




isi — > cons(Nat, 

'at -> 

ai -> s(iVat) 


List) 



générâtes the second component of the least solution of the System given in 
Example 22. 



2.4 Context-free Word Languages and Regular 
Tree Languages 

Context-free word languages and regular tree languages are strongly related. 
This is not surprising since dérivation trees of context-free languages and dériva- 
tions of tree grammars look alike. For instance let us consider the context-free 
language of arithmetic expressions on +,* and a variable x. A context-free word 
grammar generating this set is E — > x | E + E \ E * E where E is the axiom. 
The génération of a word from the axiom can be described by a dérivation tree 
which has the axiom at the root and where the generated word can be read 
by picking up the leaves of the tree from the left to the right (computing what 
we call the yield of the tree). The rules for constructing dérivation trees show 
some regularity, which suggests that this set of trees is regular. The aim of this 
section is to show that this is true indeed. However, there are some traps which 
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must be avoided when linking tree and word languages. First, we describe how 
to relate word and trees. The symbols of T are used to build trees but also 
words (by taking a symbol of T as a letter) . The Yield operator comprîtes a 
word from a tree by concatenating the leaves of the tree from the left to the 
right. More precisely, it is defined as follows. 

Yield(a) = a if arFo, 

Yield(f(si, . . . , s n )) = Yield(s\) . . . Yield(s n ) if / 6 T n , s, G T(T). 

Example 24. Let T = {x, +, *, E(, , )} and let 

E 

s = 



X + X 

then Yield(s) = x * x + x which is a word on {x, *, +}. Note that * and + are 
not the usual binary operator but syntactical symbols of arity 0. If 

E 



x * x 
then Yield (t) = x * x + x. 



We recall that a context-free word grammar G is a tuple (S, N, T, R) 
where S is the axiom, N the set of non-terminals letters, T the set of terminal 
letters, R the set of production rules of the form A — > a with A G N, a G 
(TUN)*. The usual définition of dérivation trees of context free word languages 
allow nodes labelled by a non-terminal A to hâve a variable number of sons, 
which is equal to the length of the right-hand side a of the rule A — > a used to 
build the dérivation tree at this node. 

Since tree languages are defined for signatures where each symbol has a fixed 
arity, we introduce a new symbol (A, m) for each A e N such that there is a rule 
A — > a with a of length m. Let G be the set composed of thèse new symbols 
and of the symbols of T. The set of dérivation trees issued from a a G, denoted 
by D(G,a) is the smallest set such that: 

• D(G,a) = {a} if a G T, 

• (a, 0)(e) G D{G 1 a) if a — > e G R where e is the empty word, 

• (a,p){t 1 ,...,t p ) G D(G,(a,p)) if ti G D(G, ai), . . . ,t p G D(G,a p ) and 
(a — > ai . . . a p ) G R where ai G G- 

The set of dérivation trees of G is D(G) = U(s,i)eçD(G, (S,i)). 

Example 25. Let T = {x, +, *} and let G be the context free word grammar 
with axiom S, non terminal Op, and rules 
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S -> S Op s 
S —s- x 
Op^ + 
Op — > * 

Let the word u = x * x + x, a dérivation tree for u with G is dc(u), and the 
same dérivation tree with our notations is Dq(u) G D(G, S) 

S (S, 3) 



dG{u) = ^p > ; Dg(u) =(S,T) (OrI) (5; 3) 



^pS x * (S,l) (0 P ,1) ($1) 

x + ce a; + x 



By définition, the language generated by a context-free word grammar G is 
the set of words computed by applying the Yield operator to dérivation trees of 
G. The next theorem states how context-free word languages and regular tree 
languages are related. 

Theorem 21. The following statements holà. 

1. Let G be a context-free word grammar, then the set of dérivation trees of 
L(G) is a regular tree language. 

2. Let L be a regular tree language then Yield(L) is a context-free word lan- 
guage. 

3. There exists a regular tree language which is not the set of dérivation trees 
of a context-free language. 

Proof. We give the proofs of the three statements. 

1. Let G = (S 1 , N, T, R) be a context-free word language. We consider the 
tree grammar G" = (S, N^J 7 , R')) such that 

• the axiom and the set of non-terminal symbols of G and G" are the 
same, 

• T = T U {e} U {(A, n) \ A G N, 3A -> a £ R with a of length n} 7 

• if A -> e then A -» (A, 0)(e) G R' 

• if (A -> ai . . .a p ) e R then (A -» (A,p)(ai, . . . ,a p )) G iî' 

Then L(G) = { Yield(s) \ s G L(G')}. The proof is a standard induction on 
dérivation length. It is interesting to remark that there may and usually 
does exist several tree languages (not necessarily regular) such that the 
corresponding word language obtained via the Yield operator is a given 
context-free word language. 

2. Let G be a normalized tree grammar (S,X,N,R). We build the word 
context-free grammar G" = (S, X, N, R') such that a rule X — > X\ . . . X n 
(resp. X — > a) is in R' if and only if the rule X — y f{X\, ■ ■ ■ , X n ) (resp. 
X — y a) is in R for some /. It is straightforward to prove by induction on 
the length of dérivation that L(G') = Yield (L(G)). 
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3. Let G be the regular tree grammar with axiom X , non-terminals X, Y, Z, 
terminais a,b, g and rules 

X -> f(Y,Z) 
Y -» <?(a) 
Z - <?(&) 

The language -L(G) consists of the single tree (arity hâve been indicated 
explicitly to make the link with dérivation trees): 




Assume that L(G) is the set of dérivation trees of some context-free word 
grammar. To generate the first node of the tree, one must hâve a rule 
F — > G G where F is the axiom and rules G — > a, G — > b (to get the inner 
nodes). Therefore the following tree: 




should be in L(G) which is not the case. 

□ 

2.5 Beyond Regular Tree Languages: Context- 
free Tree Languages 

For word language, the story doesn't end with regular languages but there is a 
strict hierarchy. 

regular C context-free C recursively enumerable 

Recursively enumerable tree languages are languages generated by tree gram- 
mar as defined in the beginning of the chapter, and this class is far too gênerai 
for having good properties. Actually, any Turing machine can be simulated by 
a one rule rewrite System which shows how powerful tree grammars are (any 
grammar rule can be seen as a rewrite rule by considering both terminais and 
non-terminals as syntactical symbols). Therefore, most of the research has been 
done on context-free tree languages which we describe now. 
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2.5.1 Context-free Tree Languages 

A context-free tree grammar is a tree grammar G = (S 1 , N, T, R) where the 
rules hâve the form X(x±, . . . ,x n ) — * t with t a tree of T(J 7 UiVU {xi, . . . , X n }), 
xi, . . . , x n G X where X is a set of reserved variables with X n [T U N) = 0, 
X a non-terminal of arity n. The définition of the dérivation relation is slightly 
more complicated than for regular tree grammar: a term t dérives a terni t' 
if no variable of X occurs in t or t', there is a rule l — y r of the grammar, a 
substitution a such that the domain of a is included in X and a context C 
such that t = C[la] and t' = C[ra]. The context-free tree language L(G) 
is the set of trees which can be derived from the axiom of the context-free tree 
grammar G. 

Example 26. The grammar of axiom Prog, set of non-terminals {Prog, Not, FactÇ)}, 
set of terminais {0, s, i/(, ), eq{, ), not(), times(, ), decÇ)} and rules 

Prog -> Fact(Nat) 

Nat -> 

Nat -> s(Nat) 

Fact{x) --> if(eq(x,0),s(Q)) 

Fact(x) — > if(not(eq(x,0)),times(x,Fact(dec(x)))) 

where X = {x} is a context-free tree grammar. The reader can easily see that 
the last rule is the classical définition of the factorial function. 

The dérivation relation associated to a context-free tree grammar G is a gen- 
eralization of the dérivation relation for regular tree grammar. The dérivation 
relation — > is a relation on pairs of terms of T(FUN) such that s — * t iff there is 
a rule X{x\, . . . , x„) — > a G R, a context C such that s = C[X(t±, . . . , £„)] and 
t = C[a{x\<— ii, . . . ,£„<— i n }]- For instance, the previous grammar can yield 
the séquence of dérivations 

Prog -> Fact(Nat) -> Facf(0) -> i/(eç(0,0), s(0)) 

The language generated by G, denoted by L(G) is the set of terms of T{T) 
which can be reached by successive dérivations starting from the axiom. Such 
languages are called context-free tree languages. Context-free tree languages are 
closed under union, concaténation and closure. Like in the word case, one can 
define pushdown tree automata which recognize exactly the set of context-free 
tree languages. We discuss only 10 and 01 grammars and we refer the reader 
to the bibliographie notes for more informations. 

2.5.2 IO and OI Tree Grammars 

Context-free tree grammars hâve been extensively studied in connection with 
the theory of recursive program scheme. A non-terminal F can be seen as 
a function name and production rules F(x%, . . . ,x n ) —s- t define the function. 
Recursive définitions are allowed since t may contain occurrences of F. Since we 
know that such recursive définitions may not give the same results depending 
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on the évaluation strategy, 10 and 01 tree grammars hâve been introduced to 
account for such différences. 

A context-free grammar is IO (for innermost-outermost) if we restrict légal 
dérivations to dérivations where the innermost terminais are derived first. This 
control corresponds to call by value évaluation. A context-free grammar is OI 
(for outermost-innermost) if we restrict légal dérivations to dérivations where 
the outermost terminais are derived first. This corresponds to call by name 
évaluation. Therefore, given one context-free grammar G, we can define IO-G 
and OI-G and the next example shows that the languages generated by thèse 
grammars may be différent. 

Example 27. Let G be the context-free grammar with axiom Exp, non- 
terminals {Exp, Nat, Dup}, terminais {double, s, 0}) and rules 

Exp -> Dup(Nat) 

Nat -> s(Nat) 

Nat -> 

Dup(x) — » double(x,x) 
Then outermost-innermost dérivations hâve the form 

Exp -► Dup(Nat) -► double(Nat, Nat) A double(s n {0), s m (0)) 

while innermost-outermost dérivations hâve the form 

Exp -> Dup(Nat) A Dup(s n (0)) -» double(s n (0) , s"(0)) 

Therefore L(OI-G) = {double(s n (0),s m (0)) n,m G N} and 
L(IO-G) = {double(s n {0),s n (0)) n G N}. 

A tree language L is IO if there is some context-free grammar G such that 
L = L(IO-G). The next theorem shows the relation between L(IO-G), L(OI-G) 
and L{G). 

Theorem 22. The following inclusion holds: L(IO-G) C L{OI-G) = L(G) 

Example 27 shows that the inclusion can be strict. JO-languages are closed 
under intersection with regular languages and union, but the closure under 
concaténation requires another définition of concaténation: ail occurrences of a 
constant generated by a non right-linear rule are replaced by the some term, as 
shown by the next example. 



Example 28. Let G be the context-free grammar with axiom Exp, non- 
terminals {Exp, Nat, F et}, terminais {□,/(_,_,_)} and rules 

Exp^ Fct(Nat,Nat) 

Nat -> □ 

Fct(x,y) -> f(x,x,y) 
and let L = IO-G and M = {0,1}, then L. a M contains /(0, 0, 0),/(0, 0, 1), 
fil, 1,0), /(l, 1, 1) but not /(1,0, 1) nor /(0, 1, 1). 
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There is a lot of work on the extension of results on context-free word grain- 
mars and languages to context-free tree grammars and languages. Unfortu- 
nately, many constructions and theorem can't be lifted to the tree case. Usually 
the failure is due to non-linearity which expresses that the same subtrees must 
occur at différent positions in the tree. A similar phenomenon occurred when we 
stated results on recognizable languages and tree homomorphisms: the inverse 
image of a recognizable tree language by a tree homorphism is recognizable, but 
the assumption that the homomorphism is linear is needed to show that the 
direct image is recognizable. 

2.6 Exercises 

Exercise 20. Let T = {/(, ), g(),a}. Consider the automaton A = (Q,J-,Qf,A) 
defined by: Q = {q,q g ,qj}, Qj = {ç/}, and A = 

{ « -> ç(«) s(?(»)) -> i(g{ x )) 

g(q{x)) -> q g {g{x)) g(q g (x)) -> qf(g(x)) 

f(q(x),q(y)) - q(J(x,y)) }. 
Define a regular tree grammar generating L(A). 
Exercise 21. 

1. Prove the équivalence of a regular tree grammar and of the reduced regular tree 
grammar computed by algorithm of proposition 2. 

2. Let T — {f(,),g(),a}. Let G be the regular tree grammar with axiom X, 
non-terminal A, and rules 

X^f(g(A),A) 
A - g(g(A)) 

Define a top-down NFTA, a NFTA and a DFTA for L(G). Is it possible to 
define a top-down DFTA for this language? 

Exercise 22. Let T — {/(, ), a}. Let G be the regular tree grammar with axiom X, 
non-terminals A, B, C and rules 

X ^C 
X ^a 
X ^ A 
A^f(A,B) 
B -> a 

Compute the reduced regular tree grammar équivalent to G applying the algorithm 
defined in the proof of Proposition 2. Now, consider the same algorithm, but first 
apply step 2 and then step 1. Is the output of this algorithm reduced? équivalent to 
G? 

Exercise 23. 

1. Prove Theorem 6 using regular tree grammars. 

2. Prove Theorem 7 using regular tree grammars. 

Exercise 24. (Local languages) Let T be a signature, let t be a term of T(J r ), then 
we define fork(t) as follows: 

• fork(a) — 0, for each constant symbol a; 
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• fork(f(ti,. . .,*„)) = {/(Head(ti), . . . , Head(t n ))} U U-=i fork(U) 

A tree language L is local if and only if there exist a set T' Ç JF and a set 
G Ç fork(T(T)) such that i G L iff rooi(t) G .F' and fork(t) Ç G. Prove that every 
local tree language is a regular tree language. Prove that a language is local iff it is 
the set of dérivation trees of a context-free word language. 

Exercise 25. The pumping lemma for context-free word languages states: 

for each context-free language L, there is some constant k > 1 such that 
each z £ L of length greater than or equal to k can be written z — uvwxy 
such that vx is not the empty word, vwx has length less than or equal to 
k, and for each n > 0, the word uv n wx n y is in L. 

Prove this resuit using the pumping lemma for tree languages and the results of this 
chapter. 

Exercise 26. Another possible définition for the itération of a language is: 

• L ' D = {D} 

• L n+1 ' D = L n > ° U L n ' D .□ L 

(Unfortunately that définition was given in the previous version of TATA) 

1. Show that this définition may generate non-regular tree languages. Hint: one 
binary symbol /( , ) and □ are enough. 

2. Are the two définitions équivalent (i.e. generate the same languages) if £ consists 
of unary symbols and constants only? 

Exercise 27. Let T be a ranked alphabet, let t be a term of T(T), then we define 
the word language Branch(t) as follows: 

• Branch(a) — a, for each constant symbol a; 

• Branch(f(ti, . . . , t n )) — Uî=i {/ u I u ^ Branch(ti)} 

Let L be a regular tree language, prove that Branch(L) = Utgz, Branch(t) is a regular 
word language. What about the converse? 

Exercise 28. 

1. Let T be a ranked alphabet such that Ta — {a, b}. Find a regular tree language 
L such that Yield(L) = {a n b n \ n > 0}. Find a non regular tree language L 
such that Yield(L) = {a n b n \ n > 0}. 

2. Same questions with Yield(L) — {u £ .Fq | |u|o — l u |b} where \u\ a (respectively 
\u\b) dénotes the number of a (respectively the number of b) in u. 

3. Let T be a ranked alphabet such that Ta = {a,b,c}, let Ai — {a n b n c p \ n,p > 
0}, and let A2 — {a n b p c p \ n,p > 0}. Find regular tree languages such that 
Yield(Li) — Ai and Yield(L2) — A%. Does there exist a regular tree language 
such that Yield(L) = Ai n A 2 . 

Exercise 29. 

1. Let G be the context free word grammar with axiom X, terminais a, b, and 
rules 

X ^ XX 

X ^aXb 
X ^e 

where e stands for the empty word. What is the word language L(G)7 Give a 
dérivation tree for u — aabbab. 
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2. Let G' be the context free word grammar in Greibach normal form with axiom 
X, non terminais X' , Y' , Z terminais a, b, and rules 1 

X' -> aX'y' 
X' -^ aY' 
X' -► aX'Z' 
X' -* aZ' 
Y' -* bX' 
Z' -+b 

prove that L(G') — L(G). Give a dérivation tree for u = aabbab. 

3. Find a context free word grammar G" such that L(G") — Ai U A% (Ai and Ai 
are defined in Exercise 28). Give two dérivation trees for u — abc. 

Exercise 30. Let fbea ranked alphabet. 

1. Let L and L' be two regular tree languages. Compare the sets Yield(L n L') 
and Yield(L) n Yield(L'). 

2. Let A be a subset of JT . Prove that T(T, A) — T(T n A) is a regular tree 
language. Let L be a regular tree language over T , compare the sets Yield(L n 
TÇF,A)) and Yield(L) n Yield{T(T,A)). 

3. Let R be a regular word language over JT . Let T(J r , _R) = {i G T(^ r ) 

Yield(t) G _R}. Prove that T{T,R) is a regular tree language. Let L be a regu- 
lar tree language over T , compare the sets Yield(L n T(^ r , R)) and Yield(L) n 

Yield(T{T,R)). As a conséquence of the results obtained in the présent exer- 
cise, what could be said about the intersection of a context free word language 
and of a regular tree language? 



2.7 Bibliographie notes 

This chapter only scratches the topic of tree grammars and related topics. A 
useful référence on algebraic aspects of regular tree language is [GS84] which 
contains a lot of classical results on thèse features. There is a huge littérature on 
tree grammars and related topics, which is also relevant for the chapter on tree 
transducers, see the références given in this chapter. Systems of équations can 
be generalized to formai tree séries with similar results [BR82, Boz99, BozOl, 
Kui99, KuiOl]. The notion of pushdown tree automaton has been introduced by 
Guessarian [Gue83] and generalized to formai tree séries by Kuich [KuiOl] The 
reader may consult [Eng82, ES78] for 10 and 01 grammars. The connection 
between recursive program scheme and formalisms for regular tree languages is 
also well-known, see [Cou86] for instance. We should mention that some open 
problems like équivalence of deterministic tree grammars are now solved using 
the resuit of Senizergues on the équivalence of deterministic pushdown word 
automata [Sén97]. 
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Chapter 3 

Logic, Automata and 
Relations 

3.1 Introduction 

As early as in the 50s, automata, and in particular tree automata, played an 
important rôle in the development of vérification . Several well-known logicians, 
such as A. Church, J.R. Bùchi, Elgott, MacNaughton, M. Rabin and others 
contributed to what is called "the trinity" by Trakhtenbrot: Logic, Automata 
and Vérification (of Boolean circuits) . 

The idea is simple: given a formula <j> with free variables Xi, ..., x n and a do- 
main of interprétation D, <f> defines the subset of D" containing ail assignments 
of the free variables x\, . . . ,x n that satisfy <ft. Hence formulas in this case are 
just a way of defining subsets of D n (also called n-ary relations on D). In case 
n = 1 (and, as we will see, also for n > 1), finite automata provide another 
way of defining subsets of D n . In 1960, Bùchi realized that thèse two ways 
of defining relations over the free monoid {0, . . . ,n}* coïncide when the logic 
is the sequential calculus, also called weak second-order monadic logic with one 
successor, WS1S. This resuit was extended to tree automata: Doner, Thatcher 
and Wright showed that the definability in the weak second-order monadic logic 
with k successors, WSkS coincide with the recognizability by a finite tree au- 
tomaton. Thèse results imply in particular the decidability of WSkS, following 
the décision results on tree automata (see chapter 1). 

Thèse ideas are the basis of several décision techniques for various logics 
some of which will be listed in Section 3.4. In order to illustrate this correspon- 
dence, consider Presburger's arithmetic: the atomic formulas are equalities and 
inequalities s = t or s > t where s, t are sums of variables and constants. For in- 
stance x+y+y = z+z+z+1+l, also written x+2y = Sz+2, is an atomic formula. 
In other words, atomic formulas are linear Diophantine (in)equations. Then 
atomic formulas can be combined using any logical connectives among A, V, —> 
and quantifications V, 3. For instance Va;.(Vj/.-i(a; = 2y)) =>• (3y.x = 2y+ 1)) is a 
(true) formula of Presburger's arithmetic. Formulas are interpreted in the natu- 
ral numbers (non- négative integers), each symbol having its expected meaning. 
A solution of a formula <p{x) whose only free variable is x, is an assignment of 
x to a natural number n such that 4>{n) holds true in the interprétation. For 
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Figure 3.1: The automaton with accepts the solutions of x = y + z 



instance, if <$>(x) is the formula 3y.x = 2y, its solutions are the even numbers. 

Writing integers in base 2, they can be viewed as éléments of the free monoid 
{0, 1}*, i.e.words of Os and ls. The représentation of a natural number is not 
unique as 01 = 1, for instance. Tuples of natural numbers are displayed by 
stacking their représentations in base 2 and aligning on the right, then complet- 
ing with some Os on the left in order to get a rectangle of bits. For instance the 

pair (13,6) is represented as o î î o (or oono as well). Hence, we can see the 
solutions of a formula as a subset of ({0, l}' 1 )* where n is the number of free 
variables of the formula. 

It is not difficult to see that the set of solutions of any atomic formula is 
recognized by a finite word automaton working on the alphabet {0, 1}™. For 
instance, the solutions of x = y + z are recognized by the automaton of Figure 
3.1. 

Then, and that is probably one of the key ideas, each logical connective 
corresponds to a basic opération on automata (hère word automata): V is a 
union, A and intersection, -i a complément, 3x a projection (an opération which 
will be defined in Section 3.2.4). It follows that the set of solutions of any 
Presburger formula is recognized by a finite automaton. 

In particular, a closed formula (without free variable), holds true in the 
interprétation if the initial state of the automaton is also final. It holds false 
otherwise. Therefore, this gives both a décision technique for Presburger formu- 
las by Computing automata and an effective représentation of the set of solutions 
for open formulas. 

The example of Presburger's arithmetic we just sketched is not isolated. 
That is one of the purposes of this chapter to show how to relate finite tree 
automata and formulas. 

In gênerai, the problem with thèse techniques is to design an appropriate 
notion of automaton, which is able to recognize the solutions of atomic formulas 
and which has the desired closure and décision properties. We hâve to cite hère 
the famous Rabin automata which work on infinité trees and which hâve indeed 
the closure and decidability properties, allowing to décide the full second-order 
monadic logic with k successors (a resuit due to M. Rabin, 1969). It is however 
out of the scope of this book to survey automata techniques in logic and com- 
puter science. We restrict our attention to finite automata on finite trees and 
refer to the excellent surveys [Rab77, Tho90] for more détails on other applica- 
tions of automata to logic. 
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We start this chapter by reviewing some possible définitions of automata 
on pairs (or, more generally, tuples) of finite trees in Section 3.2. We define in 
this way several notions of recognizability for relations, which are not necessary 
unary, extending the frame of chapter 1. This extension is necessary since, 
automata recognizing the solutions of formulas actually recognize n-tuples of 
solutions, if there are n free variables in the formula. 

The most natural way of defining a notion of recognizability on tuples is to 
consider products of recognizable sets. Though this happens to be sometimes 
sufîicient, this notion is often too weak. For instance the example of Figure 3.1 
could not be defined as a product of recognizable sets. Rather, we stacked the 
words and recognized thèse codings. Such a construction can be generalized to 
trees (we hâve to overlap instead of stacking) and gives rise to a second notion 
of recognizability. We will also introduce a third class called "Ground Tree 
Transducers" which is weaker than the second class above but enjoys stronger 
closure properties, for instance by itération. Its usefulness will become évident 
in Section 3.4. 

Next, in Section 3.3, we introduce the weak second-order monadic logic with 
k successor and show Thatcher and Wright's theorem which relates this logic 
with finite tree automata. This is a modest insight into the relations between 
logic and automata. 

Finally in Section 3.4 we survey a number of applications, mostly issued 
from Terni Rewriting or Constraint Solving. We do not détail this part (we 
give références instead). The goal is to show how the simple techniques devel- 
oped before can be applied to various questions, with a spécial emphasis on 
décision problems. We consider the théories of sort constraints in Section 3.4.1, 
the theory of linear encompassment in Section 3.4.2, the theory of ground term 
rewriting in Section 3.4.3 and réduction stratégies in orthogonal term rewrit- 
ing in Section 3.4.4. Other examples are given as exercises in Section 3.5 or 
considered in chapters 4 and 5. 

3.2 Automata on Tuples of Finite Trees 

3.2.1 Three Notions of Recognizability 

Let Rec x be the subset of n-ary relations on T(T) which are finite unions of 
products Si X ... X S n where Si, . . . , S n are recognizable subsets of T(T). This 
notion of recognizability of pairs is the simplest one can imagine. Automata for 
such relations consist of pairs of tree automata which work independently This 
notion is however quite weak, as e.g. the diagonal 

A = {(t 1 t)\teT(T)} 

does not belong to Rec x . Actually a relation R G Rec x does not really relate 
its components! 

The second notion of recognizability is used in the correspondence with 
WSkS and is strictly stronger than the above one. Roughly, it consists in over- 
lapping the components of a n-tuple, yielding a term on a product alphabet. 
Then define Rec as the set of sets of pairs of ternis whose overlapping coding is 
recognized by a tree automaton on the product alphabet. 
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.// 



a a 



gf 



an 



ga 



a a A. 



Figure 3.2: The overlap of two ternis 



Let us Ërst define more precisely the notion of "coding" . (This is illustrated 
by an example on Figure 3.2). We let T' = (.FU{-L}) n , where _L is a new symbol. 
This is the idea of "stacking" the symbols, as in the introductory example of 
Presburger's arithmetic. Let k be the maximal arity of a function symbol in T . 
Assuming _L has arity 0, the arities of function symbols in T' are defined by 
a(/i • ■ • fn) = max(a(/i), . . . , a(f n )). 

The coding of two terms ii, £2 £ T{F) is defined by induction: 



[f(ti,...,t n ),g(ui, 
if n > m and 

[f{ti,...,tn),g(ui, 



,«m)] = fg([tl,Ul] 



,«m)] = fg([ti,ui] 



, \T m , limj j \pn 



,±1 



[*n»-L]) 



[tn, Un], LL,U»+l],. 



,**») is 



if m > n. 

More generally, the coding of n terms fi(t\, . . . , t^ 1 ), . . . , /«(i™, . 
defined as 

/l • • • Jn\[t\i ■ ■ ■ j i n J; ■ ■ ■ j [ii , ■ ■ • , t„ \) 

where m is the maximal arity of f\, . . . , f n € T and t\ is, by convention, ± when 
j > h- 

Définition 4. Rec is the set of relations R Ç T(!F) n such that 
{[h,...,t n ] | {h,...,t n )eR} 

is recognized by a finite tree automaton on the alphabet T' = (T U {_L}) n . 

For example, consider the diagonal A, it is in Rec since its coding is recog- 
nized by the bottom-up tree automaton whose only state is q (also a final state) 
and transitions are the rules //(g, . . . , q) — ► q for ail symbols / G T . 

One drawback of this second notion of recognizability is that it is not closed 
under itération. More precisely, there is a binary relation R which belongs to 
Rec and whose transitive closure is not in Rec (see Section 3.5). For this reason, 
a third class of recognizable sets of pairs of trees was introduced: the Ground 
Tree Transducers (GTT for short) . 

Définition 5. A GTT is a pair of bottom-up tree automata («4i,^a) working 
on the same alphabet. Their sets of states may however share some symbols (the 
synchronization states h 
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Figure 3.3: GTT acceptance 



A pair (t, t') is recognized by a GTT (Ai, A2) if there is a context C G C n {T) 
such that t = C[t±, . . . , t n ], t' = C[ti, . . . , t' n ] and there are states qi, ■ ■ ■ ,q n of 
both automata such that, for ail i, ti > qi andt^ > q{. We write L(Ai, A2) 

■Al J<2 

the language accepted by the GTT (Ai,A2), i.e.the set of pairs of terms which 
are recognized. 



The recognizability by a GTT is depicted on Figure 3.3. For instance, A is 
accepted by a GTT. Another typical example is the binary relation "one step 
parallel rewriting" for term rewriting System whose left members are linear and 
whose right hand sides are ground (see Section 3.4.3). 

3.2.2 Examples of The Three Notions of Recognizability 

The first example illustrâtes Rec > : . It will be developed in a more gênerai 
framework in Section 3.4.2. 



Example 29. Consider the alphabet T = {/, g, a] where / is binary, g is unary 
and a is a constant. Let P be the predicate which is true on t if there are terms 
il, £2 such that f{g{t\), £2) is a subterm of t. Then the solutions of P(x) A P(y) 
define a relation in Rec x , using twice the following automaton: 



Q ~- 


= Uf,q g ,qT} 






Qf ~- 


= M 






T -- 


- { 


■* <7T 


f(qT,qr) -> q-v 




g(qr) - 


■* q-y 


f{if,Qr) -» qf 




g(qf) - 


■* if 


f(q g ,QT) -» qf 




9(qt) - 


■* °g 


f(qr,Qf) -> 9/} 



For instance the pair (g(f(g{a),g(a))),f(g(g(a)),a)) is accepted by the pair 
of automata. 



The second example illustrâtes Rec. Again, it is a first account of the devel- 
opments of Section 3.4.4 
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Example 30. Let T = {/, g,a,fl} where / is binary, g is unary, a and O are 
constants. Let R be the set of ternis (i, u) such that u can be obtained from t 
by replacing each occurrence of O by some term in T(T) (each occurrence of O 
needs not to be replaced with the same term). Using the notations of Chapter 
2 

R(t, u) ^^ u G t. n T{T) 

R is recognized by the following automaton (on codings of pairs): 



Q - 


= {?,<?'} 










Qf - 


= {<?'} 










T -- 


= { 


la 


-> 


g 


-L/(g,g) -> g 






J-ff(ff) 


-> 


g 


n/(?,g) -» g' 






10 


-> 


g 


//(g', g') - g' 






aa 


— > 


g' 


ra(g') - g' 






00 


-> 


g' 


5 (g) - g' 






Oa 


— > 


a'} 





For instance, the pair (/(g(0), g(0)), f (g (g (a)), g (Cl))) is accepted by the 
automaton: the overlap of the two terms yields 

[tu} = ff(gg(ng(La)),gg(nn)) 

And the réduction: 

M A ff(99(ng(q)),9 9 (q')) 

A ff(99(q'),q') 

- //(g', g') 

- g' 

The last example illustrâtes the récognition by a GTT. It cornes from the 
theory of rewriting; further developments and explanations on this theory are 
given in Section 3.4.3. 



Example 31. Let T = {x,+,0, 1}. Let 1Z be the rewrite System x x — > 0. 

The many-steps réduction relation defined by 1Z: — > is recognized by the 

n 
GTT(„4i,.4.2) defined as follows (+ and x are used in infix notation to meet 



their usual reading): 




ïi = { -» g T 


çt + çt -> gT 


1 -» g T 


g T x g T -> gT 


-» go 


g x g T -> g } 


T 2 = { -> g } 




Then, for instance, the pair (1 + ((0 x 1) 


x 1), 1 + 0) is accepted by the GTT 


smce 




l + ((0x 1)X 1) -=-► l + {q Xq T )x 
Ai 


çt — > i + (go x qr) —> i + go 

Ai Ai 


one hand and 1 + > 1 + go on the other hand. 

-42 
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Rec 



Recy 



• I • «A 

{<*, /(a)} 2 l T(W 




GTT 



• iî r 



Figure 3.4: The relations between the three classes 



3.2.3 Comparisons Between the Three Classes 

We study hère the inclusion relations between the three classes: Rec x , Rec, GTT. 

Proposition 8. Rec x C Rec and the inclusion is strict. 

Proof. To show that any relation in Rec x is also in Rec, we hâve to construct 
from two automata Ai = (Qi, F, Q[, Ri), Ai = (JF,Q2,Q%,R2) an automaton 
which recognizes the overlaps of the ternis in the languages. We define such an 
automaton A = (Q, [T U {x}) 2 , Q f , R) by: Q = (Qi U {qj_}) x (Q 2 U {q±}), 
Qf = Q{ x Q^ and R is the set of rules: 



• / -L ((qi,qx), •■■,(<?«, qx)) -> (?,<?x) if /(<?i, 

• -L/((3X,2l):---:(3±,g , n)) ~» («X, ?) if /(«!, • 






• /fl'((Q , l>9'i):"-:(3m,9'm)î(9 , m+l,Ç±),--.,(9n,9x)) ~> (?, ?') if /(«l , 

q E Ri and g(çi, . . . , q^J — > q' G -R2 and n > m 

• fg((qi,q'i), ■■■, (in,q' n ), (q±,q n +i), ■■■, (qx,q m )) -> (g, g') if /(?i, ■ 



g G .Ri and cc(qJ, 



g' G i?2 and m > n 



The proof that .4 indeed accepts L(.Ai) x L(^) is left to the reader. 
Now, the inclusion is strict since e.g. A G Rec \ Rec x . 



,<Zn) 



.q n ) 



D 



Proposition 9. GTT C Rec and the inclusion is strict. 

Proof. Let (.Ai, A2) be a GTT accepting R. We hâve to construct an automaton 
A which accepts the codings of pairs in R. 

Let Aq = (Qo, T, Qq, To) be the automaton constructed in the proof of 
Proposition8. [t,u] ► (qi, qi) if and only if t ► gi and « ► 52- Now we 

A a A x A 2 

let A = (Qo U {qf}, T , Qf = {qf}, T). T consists of To plus the following rules: 

(q,q) -> q f ff{q f ,...,q f ) -> q f 

For every symbol / G F and every state q G Qo- 
If (t, u) is accepted by the GTT, then 



t 



C[qi 



i ynjpi,...,p„ 
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Then 

[t,u] -A [C,C][(qi,qi), . . . ,(q n ,q n )] pl ,... tPn A [C,C][q f , . . . ,qf] Pl ,..., Pn A q f 

Conversely, if [t, u] is acceptée! by A then [t, u] — ► qf. By définition of A, there 
should be a séquence: 

[t,u] A C[(q 1 ,q 1 ),...,(q n ,q n )] Plt ... tP „ A Cfe/, . .. ,g/] Pl ,..., P „ A 3/ 

Indeed, we let pt be the positions at which one of the £-transitions steps (q, q) — > 
qf is applied. (n > 0). Now, C[qf, . . . >Qf]p 1 ,...,p m qf if and only if C can be 
written [Ci, Ci] (the proof is left to the reader). 

Concerning the strietness of the inclusion, it will be a conséquence of Propo- 
sitions 8 and 10. □ 

Proposition 10. GTT % Rec x and Rec x % GTT. 

Proof. A is accepted by a GTT (with no state and no transition) but it does 
not belong to Rec x . On the other hand, if T = {/, a}, then {a, /(a)} 2 is in 
Rec x (it is the product of two finite languages) but it is not accepted by any 
GTT since any GTT accepts at least A. □ 

Finally, there is an example of a relation R c which is in Rec and not in the 
union Rec x U GTT; consider for instance the alphabet {a(),6(),0} and the one 
step réduction relation associated with the rewrite System a{x) — > x. In other 
words, 

(u, v )eR c ^^3C e C{T), 3t e T(T),u = C[a(t)] Av = C[t] 
It is left as an exercise to prove that R c G Rec \ (Rec x U GTT). 

3.2.4 Closure Properties for Rec x and Rec; Cylindrification 
and Projection 

Let us start with the classical closure properties. 

Proposition 11. Rec x and Rec are closed under Boolean opérations. 

The proof of this proposition is straightforward and left as an exercise. 
Thèse relations are also closed under cylindrification and projection. Let us 
first define thèse opérations which are spécifie to automata on tuples: 

Définition 6. If R Ç T(T) n (n > 1) and 1 < i < n then the ith projection of 
R is the relation Ri Ç T(.F)™ -1 defined by 

Ri(ti, ■ ■ ■ , t n -\) <=> 3t G T{T) R(t\, . . . , ti_i, t,ti, . . . , t n -i) 

When n = 1, T(.F)™ -1 is by convention a singleton set {T} (so as to keep 
the property that T(T) n+l = T{T) x T(T) n ). {T} is assumed to be a neutral 
élément w.r. i.Cartesian product. In such a situation, a relation R Ç T(J 7 ) is 
either or {T} (it is a propositional variable). 
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Définition 7. If R C T{F) n (n > 0) and 1 < i < n + 1, then the ith cylindri- 
fication of R is the relation R l Ç T(T) n+l defined by 

ri \%x , . . . , %i—\ 5 t, Hj • • • j £n J *" -^H^l , ■ ■ ■ ; ^i— 1 ; ^i, ■ ■ ■ ; ^n ) 

Proposition 12. _Rec x and Rec are effectively closed under projection and 
cylindrification. Actually, ith projection can be computed in linear time and the 
ith cylindrification of A can be computed in linear time (assuming that the size 
of the alphabet is constant). 

Proof. For Rec x , this property is easy: projection on the ith component simply 
amounts to remove the ith automaton. Cylindrification on the ith component 
simply amounts to insert as a ith automaton, an automaton accepting ail ternis. 
Assume that R G Rec. The ith projection of R is simply its image by the 
following linear tree homomorphism: 

hi{[fl,---,fn](tl,---,tk)) = [/l ...fi-ifi+i . . . f n ](hi(ti), . . . , hi(t m )) 

in which m is the arity of [/i . . . /j-i/j-i-i ■ • ■ f n ] (which is smaller or equal to 
k). Hence, by Theorem 6, the ith projection of R is recognizable (and we can 
extract from the proof a linear construction of the automaton) . 

Similarly, the ith cylindrification is obtained as an inverse homomorphic 
image, hence is recognizable thanks to Theorem 7. 

D 

Note that using the above construction, the projection of a deterministic 
automaton may be non-deterministic (see exercises) 

Example 32. Let T = {f,g,a} where / is binary, g is unary and a is a 
constant. Consider the following automaton A on T' = (T U {±}) 2 : The set of 
states is {q\, q2, 93, (74, q§} and the set of final states is {93} 1 



a± 


— > 


9i 


/ -L (9i,9i) 


— > 


9i 


g J- (ai) 


-> 


<h 


M92,9i) 


-> 


93 


ga(qi) 


-> 


<12 


/ -L (94,9i) 


-> 


94 


g J- (91) 


-> 


94 


/o(94,9i) 


— > 


92 


gg(i3) 


-> 


93 


//(93,93) 


-» 


93 


00 


-> 


95 


//(93,95) 


-» 


93 


35(95) 


-> 


95 


7/(95,93) 


-» 


93 


//(<?5,<75) 


-> 


95 








action of this automaton gives: 






a 


-+ 


92 


5(93) - 




93 


a 


-> 


95 


5(95) - 




95 


S(<72) 


-► 


93 


7(93,93) - 




93 


/(93,g5) 


—> 


93 


7(95,95) - 




95 


/(Ç5,<73) 


-> 


93 









1 This automaton accepts the set of pairs of ternis (u,v) such that u can be rewritten in 
one or more steps to v by the rewrite System f(g(x), y) — ► g(a). 
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Which accepts the ternis containing g(a) as a subterm 2 . 



3.2.5 Closure of GTT by Composition and Itération 

Theorem 23. // R C T(F) 2 is recognized by a GTT, then its transitive closure 
R* is also recognized by a GTT. 

The detailed proof is technical, so let us show it on a picture and explain it 
informally. 

We consider two terms (t, v) and (v, u) which are both accepted by the GTT 
and we wish that (t, u) is also accepted. For simplicity, consider only one state q 

such that t > C[q] < v and v » C'[qi, . . . ,q n ] < u. There are actually 

Ai A 2 Ai A 2 

two cases: C can be "bigger" than C" or "smaller" . Assume it is smaller. Then 
q is reached at a position inside C": C = C[C"] p . The situation is depicted 
on Figure 3.5. Along the réduction of v to q by Ai, we enter a configuration 
C"[q[, . . . ,q' n ]. The idea now is to add to Ai e-transitions from qi to q[. In this 
way, as can easily be seen on Figure 3.5, we get a réduction from u to C[q], 
hence the pair (t, u) is accepted. 

Proof. Let Ai and Ai be the pair of automata defining the GTT which accepts 
R. We compute by induction the automata Ai,Ai- A® = Ai and A™ is 
obtained by adding new transitions to A": Let Qi be the set of states of Ai 
(and also the set of states of A™). 

• If L A n (q) n L A * (q 1 ) =£ 0, q G Q\ D Qi and q-^ q' , then _4" +1 is obtained 
from Ai by adding the e-transition q — » q' and ^2 +1 = -^2 ■ 

• If L^. (9) H L A n (q r ) 7^ 0, (? G Qi H Q 2 and q-^> q 1 , then X l+1 is obtained 
from Ai by adding the e-transition <?—>?' and .À™ = A™. 

If there are several ways of obtaining A™ from A™ using thèse rules, we don't 
care which of thèse ways is used. 

First, thèse completion rules are decidable by the décision properties of 
chapter 1. Their application also terminâtes as at each application strictly 
decreases ki(n) + ki(n) where ki(n) is the number of pairs of states (q,q') G 
(Qi U Qi) x (Qi U Qi) such that there is no e-transition in Af from q to q' . We 
let A* be a fixed point of this computation. We show that («4i,./4.|) defines a 
GTT accepting R*. 

• Each pair of terms accepted by the GTT (.A*, «4.2) is in R*: we show by 
induction on n that each pair of terms accepted by the GTT (Ai^A^) 
is in R*. For n = 0, this follows from the hypothesis. Let us now 
assume that A^ is obtained by adding q — > q' to the transitions of 
Ai (The other case is symmetric). Let (t,u) be accepted by the GTT 
(A" , A% )• By définition, there is a context C and positions pi, . . . , pf. 



2 i.e.the terms that are obtained by applying at least one rewriting step using f(g(x), y) 
9{a) 



TATA — September 6, 2005 



3.2 Automata on Tuples of Finite Trees 



81 




c = 




v = 




t = 




i 



C' = 





u = 





fi 



Figure 3.5: The proof of Theorem 23 
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such that t = C[ti, . . . , tk\ Pl ,... tPk , u = C[u\, . . . , Uk]p 1 ,...,p k and there are 
states ci, . . . , Çfc £ Qi C\Q2 such that, for ail i, t% — 



q t and m > q t 

AS 



We prove the resuit by induction on the number m of times q — > q' is 



applied in the réductions t% 



A^ 



qi. If m = 0. Then this is the first 



induction hypothesis: (t,u) is accepted by (A", A%), hence (t,u) G R* 
Now, assume that, for soine i, 



^+i 



«iMp 






By définition, there is a terni u such that v ► q and v ► q' '. Hence 

.A? A" 



tiHp 



^r 1 



And the number of réduction steps using q — » q' is strictly smaller hère 
than in the réduction from £, to qi. Hence, by induction hypothesis, 
(t[v] PiP ,u) G R*. On the other hand, (t, t[v] PiP ) is accepted by the GTT 
(-4™ ,A%) since t\ PiP > q and v ► q. Moreover, by construction, 

the first séquence of réductions uses strictly less than m times the transi- 
tion q — > q' . Then, by induction hypothesis, (t,t[v] PiP ) G R* . Now from 
{t,t[v] PiP ) G R* and (t[v] PiP ,u) G R*, we conclude (t, u) G R*. 

• If (t,u) G R* , then (t,u) is accepted by the GTT (A\,A£). Let us prove 
the following intermediate resuit: 



Lemma 1. 












'i < 


U > Q 


■>?fc]pi,. 


-,Pfc 


-A 2 


g 




u -^ C[gi,. 


■i9fe]pi, 


--,ï>fc 







and hence (t, u) is accepted by the GTT. 



Let u — ^ C[çi, 

.Ar, 



<<l'k\pu-,Pk "TT* 9- Foreachi, u| p . G L^. (g-)nL^. (g*) 

.Ar> 



and g.; e Qi H Q2- Hence, by construction, q h > q^. It follows that 



-A* 



■ Qk]pi,...,p k -— * C[gi,.. . ,gfe] Pl ,..., Pfc — > g 



Which proves our lemma. 
Symmetrically, if i > C[<Zi 



u ► g and m ► g, then t 

Ai ^4 2 "^l 



9fejpi,...,p fc j 
■ '7 



v — » C[gi, 



! QfcJpi,...,Pfcï 
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Now, let (t, u) G R n : we prove that (t, u) is accepted by the GTT (A^A^) 
by induction on n. If n = 1, then the resuit follows from the inclu- 
sion of L(Ai 1 A2) in L(A\ 1 A2)- Now, let v be such that (t,v) G R and 
(w,m) G R n . By induction hypothesis, both (t 7 v) and (t>,w) are ac- 
cepted by the GTT (AX^A^)'- there are context C and C" and positions 
Pi, ■ ■ ■ ,Pk,p[, ■ ■ ■ ,p' m such that 

t = C[ti, . . . , tk] Pl ,..., Pk , v = C[v\, . . . , Vk] Pl ,..., Pk 

v = C'[v[,...,v' m ] p ' i> ... >p , m , u = C'[ui,...,u m ] 

and states qi, . . . ,qk,q[, ■ ■ ■ ,q' m G Q\ H Q2 such that for ail i,j, ti > qi, 

■Al 

Vi > qi, v'a ► q'j, Uj > <?'. Let C" be the largest context more 

*4* .A* .A* 

gênerai than C and C"; the positions of C" are the positions of both 
C[qi,...,q n } Pl ,..., Pn and C"[gi, . . . , q^jp^,...,^- C" , p",...,p" are such 
that: 

— For each 1 < i < l, there is a j such that either pj = p'/ or p 1 - = p" 

— For ail 1 < i < n there is a j such that p^ > p" 

— For ail 1 < i < m there is a j such that p^ > p" 

— the positions p" are pairwise incomparable w.r.t.the prefix ordering. 

Let us fix a j G [1../]. Assume that p" = pi (the other case is symmetric). 
We can apply our lemma to tj = t\ p » (in place of t), Vj = v\ p » (in place 

of v) and u\ p » (in place of u), showing that u\ p » > qi. If we let now 

q'J = qi when p" = pi and q'J = q\ when p" = p' ; , we get 

which complètes the proof. 

D 

Proposition 13. If R and R' are in GTT then their composition Ro R' is also 
in GTT. 

Proof. Let (Ai,A2) an d (A^A^) be the two pairs of automata which recognize 
R and R' respectively. We assume without loss of generality that the set of 
states are disjoint: 

(QiUQ 2 )n(Q' 1 uQ 2 ) = 

We define the automaton A* as follows: the set of states is Q\ U Q[ and the 
transitions are the union of: 

• the transitions of Ai 

• the transitions of A[ 

• the e-transitions q — > q' if q G Q\ C\Q2, q' G Q[ and L_^ 2 (q) C\L^ (q 1 ) ^ 
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Figure 3.6: The proof of Proposition 13 



Symmetrically, the automaton A 2 is defined by: its states are Q 2 U Q 2 and the 
transitions are: 

• the transitions of A 2 

• the transitions of A 2 

• the e-transitions q' — » q if q' G Q[ fl Q' 2 , q G Q2 and L^ (q') fl L^ 2 (g) ^ 

We prove below that (^t*,^) is a GTT recognizing Ro R'. See also the figure 
3.6. 

• Assume first that (u,v) G R o i?'. Then there is a terni w such that 
(u,w) G -R and (w,w) G -R': 

u = C[wi,...,Mfc] Pl ,...,p fc , tu = C[wi,. .. ,Wfe] Pl ,... iPfc 

10 = C'[w[, . . . ,Wm]pi.-.Pm' U = ^'[^'•••^"•Ipi.-.P'm 

and, for every z G {1, . . . , fc}, m ► qi, u>i ► qi, for every i G {1, ..., m}, 

Ai Ai 

w[ > ql, Vi ► q[. Let p", ■ ■ ■ ,p[' be the minimal éléments (w.r.i.the 

A[ A' 2 

prefix ordering) of the set {p\, . . . ,Pk} U {p[, . . . ,p' m }- Each p" is either 

some pj or some p' . Assume first p" = pj . Then pj is a position in C" and 

k Wli ■ ■ • 1 ïmJp'11 ...,Pm Ipj = mîmj ' ' ■ ' ' ?m^+Jfe 3 -Jp{„ . >--->p' m . +k . 

Now, «), > qj and 

J A 2 J 



Wj = Cj [w' m , . 



^mj+kjip'm, ,—,p' m 



with w' m+i -^ q'+i for every i G {1, . . . , kj}. For i G {1, ... , kj}, let 
Çj-^ be such that: 



w +i = Wj y —^ qjii 

Cj[qj,i,. ■■,Qj,k j ]p> m .,-,p' m . +kj ~T* q i 
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For every i, w' m . +i G L A2 (q j4 ) n L A/i (q' m . +l ) and q' m . +i G Q[ n Q' 2 . 
Then, by définition, there is a transition q' m+i > <7j,j. Therefore, 

Ci [îmj , • • ■ > <4 j+fcj ] -J7* Qj and then v 



^2 



l>J 



Qj- 



Now, if p" = p', we get, in a similar way, u\ p > > g'. Aitogether: 

j ] Ai J 



u — » C [q x , 



A; 



where gf = qj if p" = Pj and g" = g^- if p" = p'j . 
• Conversely, assume that (u,v) is accepted by (.À*,^). Then 

* nr ii in * 

and, for every i, either q" G Q\ n Q2 or g" G Qi H Q 2 (by the disjointness 
hypothesis). Assume for instance that q'/ G Q' ± H Q 2 ano - consider the 
computation of .4.*: u\p» > q" . By définition, u| p « = Ci[ui, . . . , UfcJ 

with 

for every j = 1, . . . ,fc, and Ci[q[, . . . , q' k ] > q" . By construction, g, G 

' -Ai 
Qi HQ2 and L A2 (qj) f)L A > (g') 7^ 0. Let w/jj be a term in this intersection 
and w l = Cj[u>i,i, . . . ,u>i,fcj. Then 



lUj — ► Cj[gi 

-4 2 



Wi -^ q'I 

I * fi 

_4t 



■■,Qk t \ 
Ci[qi,...,qkt 



The last property cornes from the fact that v\ p " ► g" and, since q" G 

Q2J there can be only transition steps from A' 2 in this réduction. 

Symmetrically, if q" G Q\ H Q2, then we define Wi and the contexts Ci 
such that 

Wi -% q'I 

A-2 



w. 



ir c[gi,..., 9 y 



"^2 

Finally, letting w = C[wi, . . . , wi], we hâve (u, w) G i? and (w, u) G i?'. 



D 



GTTs do not hâve many other good closure properties (see the exercises). 
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3.3 The Logic WSkS 

3.3.1 Syntax 

Ternis of WSkS are formed out of the constant e, first-order variable symbols 
(typically written with lower-case letters x, y,z,x',xi, ...) and unary symbols 
1, ... ,n written in postfix notation. For instance #1123, e2111 are ternis. The 
latter will be often written omitting e (e.g. 2111 instead of e2111). 

Atomic formulas are either equalities s = t between terms, inequalities s < t 
or s > t between terms, or membership constraints t E X where t is a term and 
X is a second-order variable symbol. Second-order variables will be typically 
denoted using upper-case letters. 

Formulas are built from the atomic formulas using the logical connectives 
A, V, -i, =>•, <=, 43- and the quantifiera 3x, Va; (quantification on individuals)3X, VX 
(quantification on sets); we may quantify both first-order and second-order vari- 
ables. 

As usual, we do not need ail this artillery: we may stick to a subset of logical 
connectives (and even a subset of atomic formulas as will be discussed in Section 
3.3.4). For instance <j> <^> V is an abbreviation for (</>=$■ tp) A (i/> =></>) , <fr => ip 
is another way of writing ip 4= <j>, (f> =£- ip is an abbreviation for {—><t>) V ip, \/x.<p 
stands for -i3x.~«f) etc ... We will use the extended syntax for convenience, but 
we will restrict ourselves to the atomic formulas s = t, s < t, t Ci X and the 
logical connectives V, ->, 3a;, 3X in the proofs. 

The set of free variables of a formula <\> is defined as usual. 

3.3.2 Semantics 

We consider the particular interprétation where terms are strings belonging 
to {1, . . . , k}*, = is the equality of strings, and < is interpreted as the prefix 
ordering. Second order variables are interpreted as finite subsets of {1, . . . , k}* , 
so G is then the membership predicate. 

Let ii, . . . , i„ G {1, . . . , k}* and Si, . . . , S n be finite subsets of {1, . . . , k}*. 



Given a formula 



\X\ , . . . , X n , A i , . . . , sv n 



with free variables X\, . . . ,x n , X\, . . . , X m , the assignment {x\ \—. ► t\, . . . x n <— > 
t n ,X\ i— > Si, . . . X m t— > S m } satisfies <fi, which is written a \= <fi (or also 
il, . . . , t n , Si, . . . , S m (= 4>) if replacing the variables with their corresponding 
value, the formula holds in the above model. 

Remark: the logic SkS is defined as above, except that set variables may be 
interpreted as infinité sets. 

3.3.3 Examples 

We list below a number of formulas defining predicates on sets and singletons. 
After thèse examples, we may use the below-defined abbreviations as if there 
were primitives of the logic. 



X is a subset of Y: 



X Ç Y d =Vx.{x e X =ï x G Y) 
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Finite union: 

x = |J Xi = f f\ x t ç x a \/x.(x e x => \J x g x t 

Intersection: 



dcf 



X n Y = Z d = Vcc.cc EZ^fielAiel') 



Partition: 



Partition(X, X l7 . . . ,X n ) d = X = \J X t A f\ f\ XiHXj 



X is closed under prefix: 



PrefixClosed(X) d = Vz.Vj/.(z eXAy<z)^yeX 



Set equality: 

Emptiness: 

X is a Singleton: 



Y = X d =YCXAXCY 



X = d = \/Y.(Y Ç X => y = X) 



Sm g (x) d = x ^ a vy (y çx^-(y = xvy = 0) 

The prefix ordering: 

fc 
ce < y = f VX.(y G X A (Vz.(\/ aeI)4zGl))4i€l 

?:=i 

"every set containing y and closed by predecessor contains ce" 

This shows that < can be removed from the syntax of WSkS formulas 
without decreasing the expressive power of the logic. 

Coding of trees: assume that k is the maximal arity of a function symbol 
in T. If t G T(T) C{t) is the tuples of sets (S, S fl , . . . , S fn ) if T = 
{/i, . . . , f n }, S = (Jt=i "^/i an d <5/i is the set of positions in £ which are 
labeled with fi. 

For instance C{f{g(a),f(a, &))) is the tuple 5 = {e, 1, 11,2, 21, 22}, S f = 
{e,2}, S g = {1}, S a = {11,21}, S b = {22}. 

(5, 5/j , . . . , «S*/,,) is the coding of some t G T(T) is defined by: 

Tenn(jr,Xi,...,X n ) d = X^% 

A Partition(X, X 1} . . . , X n ) APrefixClosed(X) 

a Ati A„(/ y) =« (AU Vi.(i g x h ^xiex) 
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3.3.4 Restricting the Syntax 

If we consider that a first-order variable is a singleton set, it is possible to 
transform any formula into an équivalent one which does not contain any first- 
order variable. 

More precisely, we consider now that formulas are built upon the atomic 
formulas: 

X ÇY,Sing(X),X = Yi,X = e 

using the logical connectives and second-order quantification only. Let us 
call this new syntax the restricted syntax. 

Thèse formulas are interpreted as expected. In particular Sing(X) holds true 
when X is a singleton set and X = Yi holds true when X and Y are singleton 
sets {s} and {t} respectively and s = ti. Let us write (=2 the satisfaction relation 
for this new logic. 

Proposition 14. There is a translation T from WSkS formulas to the restricted 
syntax such that 

s i j • ■ ■ 1 s ni 01, . . . , o m \= ç{xi, ... , x n , A], . . . , A m J 

if and only if 

{si}, . . . , {s„},5i, . . . ,S m \=2 T(cf)(X Xl , . . . ,X Xn ,Xi, . . . ,X m ) 

Conversely, there is a translation T' from the restricted syntax to WSkS such 
that 

Si,...,S m \= T'(cj))(X 1 , . . . ,X m ) 

if and only if 

Si, . • . , S m \=2 4>{X\, . . . ,X m )) 

Proof. First, according to the previous section, we can restrict our attention to 
formulas built upon the only atomic formulas t E I and s = t. Then, each 
atomic formula is flattened according to the rules: 

ti e X -► 3y.y = ti A y G X 

xi = yj — > 3z.z = xi A z = yj 

ti = s — > 3z.z = t A zi = s 

The last rule assumes that t is not a variable 

Next, we associate a second-order variable X y to each first-order variable y 
and transform the fiât atomic formulas: 

T{yeX) d = X y ÇX 

T(y = xi) = X y = X x i 

T(x = e) = X x = e 

T(x = y) = X x = X y 

The translation of other flat atomic formulas can be derived from thèse ones, in 
particular when exchanging the arguments of =. 
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Now, T(cj> V V) = T(cf>) V Tfofr), T(^)) = f -,T(cj>), T(3X.<f>) = f 3X.T(<f>), 
T(3y.(f>) = 3 X y . Sing(X y ) AT (cf)). Finally, we add Sing(X x ) for each free variable 
x. 

For the converse, the translation T" has been given in the previous section, 
except for the atomic formulas X = Yi (which becomes Sing(X) A Sing(y) A 
3x3y.x EXAy<EYAx = yi) and X = e (which becomes Sing(X) A Va;. a; G 
X =>cc = e). 

D 

3.3.5 Definable Sets are Recognizable Sets 

Définition 8. A set L of tuples of finite sets of words is definable in WSkS if 
there is a formula <ft of WSkS with free variables X\ , . . . , X n such that 

(Si , . . . , S n ) G L if and only if Si , . . . , S n \= <f>. 

Each tuple of finite sets of words S\ , . . . , S n Ç {1, . . . , k}* is identified to a 
finite tree (Si, . . . , 5 n )~ over the alphabet {0, 1, _L}™ where any string containing 
a or a 1 is fc-ary and _L™ is a constant symbol, in the following way 3 : 



Vos((S u ..., S n D d = {e} U {pi | V e |J S hP < p',i G {1, . . . , k}} 



def 

(£} u {pi | dp G 

is the set of préfixes of words in some 5,. The symbol at position p: 

(Si, . . . ,S n )~(p) = ai . . .a n 
is defined as follows: 

• ai = 1 if and only if p G Si 

• ai = if and only if p ^ Si and 3p' G Si and 3p" .p ■ p" = p' 

• a,- =_L otherwise. 



Example 33. Consider for instance Si = {e, 11}, S 2 = 0, S 3 = {11, 22} three 
subsets of {1, 2}*. Then the coding (Si, S2, £3)^ is depicted on Figure 3.7. 



Lemma 2. // a set L of tuples of finite subsets of {1, . . . ,k}* is definable in 
WSkS, then L = {(Si, . . . , S n )~ \ (Si, . . . , S„) G L} is in Rec. 

Proof. By Proposition 14, if L is definable in WSkS, it is also definable with the 
restricted syntax. We are going now to prove the lemma by induction on the 
structure of the formula </> which defines L. We assume that ail variables in (j) 
are bound at most once in the formula and we also assume a fixed total ordering 
< on the variables. If ip is a subformula of cf> with free variables Y\ < . . . < Y n , 
we construct an automaton A^p working on the alphabet {0,1, _L}™ such that 
(Si, ... , S n ) |=2 ip if and only if (Si, . . . , 5„)~ G L(A^) 



3 This is very similar to the coding of Section 3.2.1 
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1-LO 




Figure 3.7: An example of a tree coding a triple of finite sets of strings 



-^ *- q *~ q' 

q q' 

1 » q' > q' 

/\ /\ 

q q q' q 

Figure 3.8: The automaton for Sing(X) 



The base case consists in constructing an automaton for each atomic formula. 
(We assume hère that k = 2 for simplicity, but this works of course for arbitrary 
k). 

The automaton AQ- in „, x ^ ' s depicted on Figure 3.8. The only final state is 

The automaton Axcy (with X < Y) is depicted on Figure 3.9. The only 
state (which is also final) is q. 

The automaton Ax=Yi is depicted on Figure 3.10. The only final state is 
q" . An automaton for X = Y 2 is obtained in a similar way. 

The automaton for X = e is depicted on Figure 3.11 (the final state is q'). 

Now, for the induction step, we hâve several cases to investigate: 

• If is a disjunction 4>\ V 4>2, where Xi are the set of free variables of 
<f>i respectively. Then we first cylindrify the automata for <fii and <p2 
respectively in such a way that they recognize the solutions of cf>i and 
02; with free variables X\ U X?,. (See Proposition 12). More precisely, let 
X\ U X2 = {Yi, • . • , Y n } with Yi < . . . < Y n . Then we successively apply 
the ith cylindrification to the automaton of (j)\ (resp. ^2) for the variables 
Yi which are not free in <fii (resp. 02 )■ Then the automaton A$ is obtained 
as the union of thèse automata. (Rec is closed under union by Proposition 
11). 
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Ql 



q q 

00 *■ q 11 - 



q q q q 



■J-0 *- a -l-1 



q q q q 

Figure 3.9: The automaton for X ÇY 



Ql 



q' q 

l-L » q' 00 - 



q q q q 

q q" 

Figure 3.10: The automaton for X = Y\ 



1 ^q i *- q' 

q q 

Figure 3.11: The automaton for X = e 
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• If (f> is a formula -><f>i then A$ is the automaton accepting the complément 
of A r p 1 . (See Theorem 5) 

• If <p is a formula 3X.<^i. Assume that X correspond to the ith component. 
Then A^ is the ith. projection of A < / )1 (see Proposition 12). 

□ 



Example 34. Consider the following formula, with free variables X, Y: 

Vx, y.(x £ X A y e Y) => ->(x > y) 

We want to compute an automaton which accepts the assignments to X, Y 
satisfying the formula. First, write the formula as 

-aXi,Y x .Xi ÇIAYi QY AG{X X ,Y 1 ) 

where G{X\,Y\) expresses that X\ is a singleton x, Y\ is a singleton y and 
x > y. We can use the définition of > as a WS2S formula, or compute directly 
the automaton, yielding 



_L_L 


-> 9 


11(9,9) ""> 92 


li(?,î) 


-*■ 9i 


0-L(9,9i) -> 9i 


0±(qi,q) 


-> 9i 


01(91,9) -> 92 


01(9,92) 


-> 92 


00(<Ï2,9) -> 92 


00(9,92) 


-> 92 





where 92 is the only final state. Now, using cylindrification, intersection, pro- 
jection and négation we get the following automaton (intermediate steps yield 
large automata which would require a full page to be displayed): 



J_± 


-> 


9() 


-L l(9o,9o) 


-s- 


9i 


1 -L (9o,9o) - 


-> 92 


-L 0(90,91) 


-> 


9i 


-L 0(91,90) 


-> 


9i 


^0(91,91) " 


■+ 9i 


-L (90,92) 


-> 


92 


-L (92,90) 


-> 


92 


0-L (92,92) - 


■+ 92 


-L l(9o,9i) 


-> 


9i 


-L l(9i,9o) 


-> 


9i 


-L l(9i,9i) - 


■+ 9i 


1 -L (90,92) 


-> 


92 


1 -L (92,90) 


-> 


92 


1 -L (92,92) - 


■+ 92 


10(9i,9o) 


-> 


93 


10(9o,9i) 


-> 


93 


10(9i,9i) - 


■+ 93 


10(91,92) 


-> 


93 


10(92,9i) 


-s- 


93 


10(9^,93) - 


■* 93 


10(93, 9i) 


-» 


93 


00(^,93) 


-» 


93 


00(93,9.) " 


->• 93 


where i ranges 


over 


{0,1, 


2,3} and 93 is the only final state. 





3.3.6 Recognizable Sets are Definable 

We hâve seen in section 3.3.3 how to represent a term using a tuple of set 
variables. Now, we use this formula Term on the coding of tuples of terms; if 
(il, ... , i„) G T{F) n , we write [ii,...,i„] the {\T\ + l) n + 1-tuple of finite sets 
which represents it: one set for the positions of [ii, . . . , t n ] and one set for each 
élément of the alphabet (JFU {-L})™ • As it has been seen in section 3.3.3, there 
is a WSkS formula Term([ii, . . . ,i n ]) which expresses the image of the coding. 
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Lemma 3. Every relation in Rec is definable. More precisely, if R G Rec there 
is a formula <ft such that, for every termsti, . . . , t n , if (Si, . . . , S m ) = \t\, . . . ,t n ], 
then 

[Si,. ..,S m ) |=2 <t> if and only if (h,.. .,t n )eR 

Proof. Let A be the automaton which accepts the set of ternis [t\, . . . , i n ] for 
(il, ... , i„) G R. The terminal alphabet of A is T' = (.FU{_I_})™, the set of states 
Q, the final states Q / and the set of transition rules T. Let T' = {/i, . . . , f m } 
and Q = {qi, . . .,qi}. The following formula 0_4 (with m + 1 free variables) 
defines the set {[ii, . . . ,t n ] | (t\, . . . , t n ) e R}. 

^Y ^Y 

Term(X,X fl ,...,X /m ) 

A Partition(X, Y qi ,...,Y qi ) 
A \J q£Qf e^Y q 

S 

a Vx. /\ /\ {(x eX f AxeY q )^{ \/ /\xieY q J) 

This formula basically expresses that there is a successful run of the automaton 
on [ii, ... , i„]: the variables Y qi correspond to sets of positions which are labeled 
with qi by the run. They should be a partition of the set of positions. The root 
has to be labeled with a final state (the run is successful). Finally, the last line 
expresses local properties that hâve to be satisfied by the run: if the sons xi of 
a position x are labeled with q\, ...,q n respectively and x is labeled with symbol 
/ and state q, then there should be a transition f(q\, . . . , q n ) — > q in the set of 
transitions. 

We hâve to prove two inclusions. First assume that (S, S\, . . . , S m ) \=2 
</>. Then (S*, Si, ... , S m ) \= Term(X, Xf x , . . . , Xf m ), hence there is a terni u G 
T(J 7 )' whose set of position is S and such that for ail i , Si is the set of positions 
labeled with fi. Now, there is a partition E qi , . . . , E qi of 5 which satisfies 

o, i>i, . . . , o m , E qi , . . . , E qi p= 

s 

Vx./\f\((xeX f Ax£Y q )=ï( V A** 6 ^)) 

/e^'geQ f(qi,-..,q s )^q&Ti=l 

This implies that the labeling E qi , . . . , E Ql is compatible with the transition 
rules: it defines a run of the automaton. Finally, the condition that the root e 
belongs to E q for some final state qf implies that the run is successful, hence 
that u is accepted by the automaton. 

Conversely, if u is accepted by the automaton, then there is a successful run 
of A on u and we can label its positions with states in such a way that this 
labeling is compatible with the transition rules in A. □ 

Putting together Lemmas 2 and 3, we can state the following slogan (which 
is not very précise; the précise statements are given by the lemmas): 

Theorem 24. L is definable if and only if L is in Rec. 

And, as a conséquence: 
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Theorem 25 ([TW68]). WSkS is decidable. 

Proof. Given a formula <p of WSkS, by Lemma 2, we can compute a finite tree 
automaton which lias the same solutions as cj). Now, assume that <f> lias no free 
variable. Then the alphabet of the automaton is empty (or, more precisely, it 
contains the only constant T according to what we explained in Section 3.2.4). 
Finally, the formula is valid iff the constant T is in the language, i.e.iff there is 
a rule T — > qf for some g/ G Qf. □ 



3.3.7 Complexity Issues 

We hâve seen in chapter 1 that, for finite tree automata, emptiness can be de- 
cided in linear time (and is PTIME-complete) and that inclusion is EXPTIME- 
complete. Considering WSkS formulas with a fixed number of quantifiers al- 
ternations N, the décision method sketched in the previous section will work 
in time which is a tower of exponentials, the height of the tower being O(N). 
This is so because each time we encounter a séquence VX, 3Y, the existential 
quantification corresponds to a projection, which may yield a non-deterministic 
automaton, even if the input automaton was deterministic. Then the élimination 
of MX requires a determinization (because we hâve to compute a complément 
automaton) which requires in gênerai exponential time and exponential space. 

Actually, it is not really possible to do much better since, even when k = 1, 
deciding a formula of WSkS requires non-elementary time, as shown in [SM73] . 



3.3.8 Extensions 

There are several extensions of the logic, which we aheady mentioned: though 
quantification is restricted to finite sets, we may consider infinité sets as models 
(this is what is often called weak second-order monadic logic with k successors 
and also written WSkS), or consider quantifications on arbitrary sets (this is 
the full SkS). 

Thèse logics require more sophisticated automata which recognize sets of 
infinité ternis. The proof of Theorem 25 carries over thèse extensions, with the 
provision that the devices enjoy the required closure and decidability properties. 
But this becomes much more intricate in the case of infinité ternis. Indeed, for 
infinité ternis, it is not possible to design bottom-up tree automata. We hâve to 
use a top-down device. But then, as mentioned in chapter 1, we cannot expect 
to reduce the non-determinism. Now, the closure by complément becomes prob- 
lematic because the usual way of Computing the complément uses réduction of 
non-determinism as a first step. 

It is out of the scope of this book to define and study automata on infinité 
objects (see [Tho90] instead). Let us simply mention that the closure under com- 
plément for Rabin automata which work on infinité trees (this resuit is known 
as Rabin's Theorem) is one of the most difficult results in the field 
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3.4 Examples of Applications 

3.4.1 Terms and Sorts 

The most basic example is what is known in the algebraic spécification commu- 
nity as order-sorted signatures . Thèse signatures are exactly what we called 
bottom-up tree automata. There are only différences in the syntax. For in- 
stance, the following signature: 

SORTS:Nat,int 
SUBSORTS : Nat < int 
FUNCTION DECLARATIONS: 



: 




— > 


Nat 


+ : 


Nat x Nat 


-s- 


Nat 


s : 


Nat 


-s- 


Nat 


P ■ 


Nat 


-s- 


int 


+ : 


int x int 


— y 


int 



abs : int — > Nat 

fact : Nat -> Nat 



is an automaton whose states are Nat, int with an e-transition from Nat to int 
and each function déclaration corresponds to a transition of the automaton. For 
example +(Nat, Nat) — > Nat. The set of well-formed terms (as in the algebraic 
spécification terminology) is the set of terms recognized by the automaton in 
any state. 

More gênerai typing Systems also correspond to more gênerai automata (as 
will be seen e.g. in the next chapter). 

This correspondence is not surprising; types and sorts are introduced in order 
to prevent run-time errors by some "abstract interprétation" of the inputs. Tree 
automata and tree grammars also provide such a symbolic évaluation mecha- 
nism. For other applications of tree automata in this direction, see e.g. chapter 
5. 

From what we hâve seen in this chapter, we can go beyond simply recogniz- 
ing the set of well-formed terms. Consider the following sort constraints (the 
alphabet T of function symbols is given) : 

The set of sort expressions SE is the least set such that 

• SE contains a finite set of sort symbols S, including the two particular 
symbols T5 and ±5. 

• If si, S2 G SE , then s\ V s%, s\ A S2, ~>s\ are in SE 

• If si, . . . , S n are inSE and / is a function symbol of arity n, then /(si, . . . , s n ) G 
SE. 

The atomic formulas are expressions t G s where t G T(J r , X) and s G SE. 
The formulas are arbitrary first-order formulas built on thèse atomic formulas. 

Thèse formulas are interpreted as follows: we are giving an order-sorted sig- 
nature (or a tree automaton) whose set of sorts is S. We define the interpretation[-]s 
of sort expressions as follows: 

• if s G S, [s] g is the set of terms in T(!F) that are accepted in state s. 
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Figure 3.12: u encompasses t 



• Fs\s = T{T) and \±_ 8 \ 8 = 

• [*i V s 2 }s = [ ai ] S U [s 2 j s , [ 81 A S2 j S = [*i]s n [ S2 j s , hsh = T(F) \ [s\ s 

• [/(si, . . . , s„)ls = {/(*i, . . . ,t„) | ti G [ai] s , . . . i„ G [s„]s} 

An assignment a, mapping variables to terms in T(JF), satisfies t G s (we 
also say that a is a solution oit G s) if ta G [s] s- Solutions of arbitrary formulas 
are defined as expected. Then 

Theorem 26. Sort constraints are decidable. 

The décision technique is based on automata computations, following the 
closure properties of Rec x and a décomposition lemma for constraints of the 
form /(ii,.. . ,i„) G s. 

More results and applications of sort constraints are discussed in the biblio- 
graphie notes. 

3.4.2 The Encompassment Theory for Linear Terms 

Définition 9. Ift G T(.F, X) and u G T(T), u encompasses t ifthere is a substi- 
tution a such that ta is a subterm of u. (See Figure 3.12.) This binary relation 
is denoted t <u or, seen as a unary relation on ground terms parametrized by 
t: < t (u). 

Encompassment plays an important rôle in rewriting: a term t is reducible 
by a term rewriting System R if and only if t encompasses at least one left hand 
side of a rule. 

The relationship with tree automata is given by the proposition: 

Proposition 15. Ift is linear, then the set of terms that encompass t is recog- 
nized by a NFTA of size 0{\t\). 

Proof. To each non- variable subterm v of t we associate a state q v . In addition 
we hâve a state q-j. The only final state is q t . The transition rules are: 

• /(<7t, • • • j Qt) — * Qt for ail function symbols. 

• f(q tl ,- ■ ■ ,qt n ) -> g/(ti,...,t„) if f(h, ■ ■ ■ ,t n ) is a subterm of t and q u is 
actually q-j is ij is a variable. 
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• /(<7t • • • , ?T) <hi STi • • • j Ht) ~ ► Ht for ail function symbols / whose arity is 
at least 1. 

The proof that this automaton indeed recognizes the set of ternis that encompass 
t is left to the reader. □ 

Note that the automaton may be non deterministic. With a slight modifica- 
tion, if u is a linear terni, we can construct in linear time an automaton which 
accepts the set of instances of u (this is also left as an exercise in chapter 1, 
exercise 8). 

Corollary 4. IflZ is a term rewriting System whose ail left members are linear, 
then the set of reducible terms in T(T), as well as the set NF of irreducible 
terms in T(T) are recognized by a finite tree automaton. 

Proof. This is a conséquence of Theorem 5. □ 

The theory of reducibility associated with a set of term S Ç T(T , X) is the 
set of first-order formulas built on the unary predicate symbols Et, t G S and 
interpreted as the set of terms encompassing t. 

Theorem 27. The reducibility theory associated with a set of linear terms is 
decidable. 

Proof. By proposition 15, the set of solutions of an atomic formula is recog- 
nizable, hence definable in WSkS by Lemma 3. Hence, any first-order formula 
built on thèse atomic predicate symbols can be translated into a (second-order) 
formula of WSkS which has the same models (up to the coding of terms into 
tuples of sets). Then, by Theorem 25, the reducibility theory associated with a 
set of linear terms is decidable. □ 

Note however that we do not use hère the full power of WSkS. Actually, 
the solutions of a Boolean combination of atomic formulas are in Rec x . So, we 
cannot apply the complexity results for WSkS hère. (In fact, the complexity of 
the reducibility theory is unknown so far). 

Let us simply show an example of an interesting property of rewrite Systems 
which can be expressed in this theory. 

Définition 10. Given a term rewriting System R, a term t is ground reducible 
if for every ground substitution a, ta is reducible by R. 

Note that a term niight be irreducible and still ground reducible. For in- 
stance consider the alphabet T = {0, s} and the rewrite System R = {s(s(0)) — > 
0}. Then the term s(s(x)) is irreducible by R, but ail its ground instances are 
reducible. 

It turns out that ground reducibility of t is expressible in the encompassment 
theory by the formula: 

n 

Vx.( <t(a;)=» V ^.( x )) 

«=1 

Where li, . . . ,l n are the left hand sides of the rewrite System. By Theorem 
27, if t, li, . . . , l n are linear, then ground reducibility is decidable. Actually, it 
has been shown that this problem is EXPTIME-complete, but is beyond the 
scope of this book to give the proof. 
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3.4.3 The First-order Theory of a Réduction Relation: the 
Case Where no Variables are Shared 

We consider again an application of tree automata to décision problem in logic 
and term rewriting. 

Consider the following logical theory. Let C be the set of ail first-order 
formulas using no function symbols and a single binary predicate symbol — >. 

Given a rewrite System 7Z, interpreting — > as — ► , yields the theory of one 

step rewriting: interpreting — > as — > yields the theory of rewriting. 

n 
Both théories are undecidable for arbitrary JZ. They become however decid- 

able if we restrict our attention to term rewriting Systems in which each variable 

occurs at most once. Basically, the reason is given by the following: 

Proposition 16. If TZ is a linear rewrite System such that left and right mem- 

bers of the rules do not share variables, then — ► is recognized by a GTT. 

n 

Proof. As in the proof of Proposition 15, we can construct in linear time a 
(non-deterministic) automaton which accepts the set of instances of a linear 
term. For each rule U — > ri we can construct a pair (Ai,A'i) of automata which 
respectively recognize the set of instances of U and the set of instances of n- 
Assume that the sets of states of the AiS are pairwise disjoint and that each 
Ai has a single final state q\. We may assume a similar property for the A\s: 
they do not share states and for each i, the only common state between Ai and 
A'i is q\ (the final state for both of them). Then A (resp. A') is the union of 
the ,4js: the states are the union of ail sets of states of the AiS (resp. A[s), 
transitions and final states are also unions of the transitions and final states of 
each individual automaton. 

We claim that (A, A') defines a GTT whose closure by itération (A*, A'*) 
(which is again a GTT according to Theorem 23) accepting — ► . For, assume 

first that u > v. Then u\ p is an instance Ua of li, hence is accepted in state 

h— >ri 

q l f. v\p is an instance r$ of n, hence accepted in state q\. Now, v = u[ri6] p , 

hence (u, v) is accepted by the GTT (A, A'). It follows that if u — ► v, (u, v) is 

accepted by (.4* ,A' t ). 

Conversely, assume that (u,v) is accepted by (A, A'), then 

u -^ C[q 1 ,...,q n }p lt ... !l>n ^- v 

Moreover, each qi is some state A, which, by définition, implies that u\ Pi is an 
instance of lj and v\ Pi is an instance of rj. Now, since lj and Tj do not share 
variables, for each i, uL. — > v\„. . Which implies that u — > v. Now, if (u.v) is 

accepted by (A*, A*), u can be rewritten in v by the transitive closure of — ► , 

which is — > itself. 
n 

U 

Theorem 28. If 1Z is a linear term rewriting System such that left and right 
members of the rules do not share variables, then the first-order theory of rewrit- 
ing is decidable. 
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Proof. By Proposition 16, — ► is recognized by a GTT. From Proposition 

9, — y is in Rec. By Lemma 3, there is a WSkS formula whose solutions are 
n 

exactly the pairs (s, t) such that s — > t. Finally, by Theorem 25, the first-order 
theory of — > is decidable. D 

3.4.4 Réduction Stratégies 

So far, we gave examples of first-order théories (or constraint Systems) which 
can be decided using tree automata techniques. Other examples will be given 
in the next two chapters. We give hère another example of application in a 
différent spirit: we are going to show how to décide the existence (and com- 
pute) "optimal réduction stratégies" in terni rewriting Systems. Informally, a 
réduction séquence is optimal when every redex which is contracted along this 
séquence has to be contracted in any réduction séquence yielding a normal form. 
For example, if we consider the rewrite System {iVT->T;TVi-* T}, there 
is no optimal sequential réduction strategy in the above sensé since, given an 
expression e\ V e2, where e\ and ei are unevaluated, the strategy should spec- 
ify which of e\ or ei has to be evaluated first. However, if we start with ei, 
then maybe ei will reduce to T and the évaluation step on e\ was unnecessary. 
Symmetrically, evaluating ei first may lead to unnecessary computations. An 
interesting question is to give sufficient criteria for a rewrite System to admit 
optimal stratégies and, in case there is such a strategy, give it explicitly. 

The formalization of thèse notions was given by Huet and Lévy in [HL91] 
who introduce the notion of sequentiality. We give briefry a summary of (part 
of) their définitions. 

T is a fixed alphabet of function symbols and Tçi = T\J {fi} is the alphabet 
T enriched with a new constant fi (whose intended meaning is "unevaluated 
term" ) . 

We define on T{Tq) the relation "less evaluated than" as: 

u Q v if and only if either u = fi or else u = f(u\, . . . ,U n ), v = 
f(vi, . . . , v n ) and for ail i, m C Vi 

Définition 11. A predicate P on TÇFq) is monotonie if u G P and u C v 

implies v G P. 

For example, a monotonie predicate of interest for rewriting is the predicate 

N-jz: t G N-jz if and only if there is a term u G T{T) such that u is irreducible 

by 1Z and t — > u. 
n 

Définition 12. Let P be a monotonie predicate on T(Tçi). If 1Z is a term 
rewriting System and t G T(Tq), a position p of Çl in t is an index for P if for 
ail terms u G TÇFq) such that t Ç u and beP, then u\ p ^ fi 

In other words: it is necessary to evaluate t at position p in order to hâve 
the predicate P satisfied. 

Example 35. Let 11 = {f{g{x),y) -> g{f{x,y))\ f{a,x) -> a;b -> g(b)}. 
Then 1 is an index of /(fi, fi) for N-ji'- any réduction yielding a normal form 
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without Çl will hâve to evaluate the terni at position 1. More formally, every 
terni f(t\, £2) which can be reduced to a term in T(T) in normal form satisfies 
t\ 7^ Çl. On the contrary, 2 is not an index of /(fi, Q) since /(a, fl) — > a. 

R 

Définition 13. A monotonie predicate P is sequential if every term t such that: 

• tgP 

• there is u G T(F), t Ç u and u G P 

has an index for P. 

If N-ji is sequential, the réduction strategy consisting of reducing an index 
is optimal for non-overlapping and left linear rewrite Systems. 

Now, the relationship with tree automata is given by the following resuit: 

Theorem 29. If P is definable in WSkS, then the sequentiality of P is express- 
ible as a WSkS formula. 

The proof of this resuit is quite easy: it suffices to translate directly the 
définitions. 

For instance, if TZ is a rewrite System whose left and right members do 
not share variables, then N-r is recognizable (by Propositions 16 and 9), hence 
definable in WSkS by Lemma 3 and the sequentiality of N-r, is decidable by 
Theorem 29. 

In gênerai, the sequentiality of N-jz is undecidable. However, one can notice 

that, if 1Z and TZ' are two rewrite Systems such that — > Ç — > then a 

n w 

position p which is an index for TZ' is also an index for TZ. (And thus, 1Z is 

sequential whenever 1Z' is sequential). 

For instance, we may approximate the term rewriting System, replacing ail 
right hand sides by a new variable which does not occur in the corresponding 
left member. Let 7?.? be this approximation and TV? be the predicate N-r?. (This 
is the approximation considered by Huet and Lévy). 

Another, refined, approximation consists in renaming ail variables of the 
right hand sides of the rules in such a way that ail right hand sides become 
linear and do not share variables with their left hand sides. Let 1Z' be such an 
approximation of 1Z. The predicate N-ri is written NV. 

Proposition 17. IfTZ is left linear, then the predicates NI and NV are defin- 
able in WSkS and their sequentiality is decidable. 

Proof. The approximations 7?.? and 1Z' satisfy the hypothèses of Proposition 16 
and hence > and — > are recognized by GTTs. On the other hand, the 

set of ternis in normal form for a left linear rewrite System is recognized by a 
finite tree automaton (see Corollary 4). By Proposition 9 and Lemma 3, ail 
thèse predicates are definable in WSkS. Then NI and NV are also definable in 
WSkS. For instance for NV: 

NV(t) d = 3u.t -U uAueNF 
w 

Then, by Theorem 29, the sequentiality of A^? and NV are definable in 
WSkS and by Theorem 25 they are decidable. □ 
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3.4.5 Application to Rigid £*- unification 

Given a finite (universally quantified) set of équations E, the classical problem 
of E -unification is, given an équation s = t, find substitutions a such that 
E \= sa = ta. The associated décision problem is to décide whether such a 
substitution exists. This problem is in gênerai unsolvable: there are décision 
procédures for restricted classes of axioms E. 

The simultaneous rigid E -unification problem is slightly différent: we are 
still giving E and a finite set of équations Si = ti and the question is to find a 
substitution a such that 

ri 

|= (/\ ea) =>• (/\s l a = t l a) 

The associated décision problem is to décide the existence of such a substitution. 

The relevance of such questions to automated déduction is very briefiy de- 
scribed in the bibliographie notes. We want hère to show how tree automata 
help in this décision problem. 

Simultaneous rigid i?-unification is undecidable in gênerai. However, for the 
one variable case, we hâve: 

Theorem 30. The simultaneous rigid E -unification problem with one variable 
is EXPTIME-complete. 

The EXPTIME membership is a direct conséquence of the following lemma, 
together with closure and décision properties for recognizable tree languages. 
The EXPTIME-hardness is obtained by réduction the intersection non-emptiness 
problem, see Theorem 12). 

Lemma 4. The solutions of a rigid E -unification problem with one variable are 
recognizable by a finite tree automaton. 

Proof. (sketch) Assume that we hâve a single équation s = t. Let x be the 
only variable occurring in E, s = t and E be the set E in which x is considered 
as a constant. Let R be a canonical ground rewrite System (see e.g. [DJ90]) 
associated with E (and for which x is minimal). We define v as x if s and t hâve 
the same normal form w.r.t.R and as the normal form of xa w.r.t.R otherwise. 

Assume Ea (= sa = ta. If v ^ x, we hâve E U {x = v} \= x = xa. Hence 
E U {x = v} \= s = t in any case. Conversely, assume that EU {x = v} \= s = t. 
Then E U {x = xa} \= s = t, hence Ea \= sa = ta. 

Now, assume v ^ x. Then either there is a subterm u of an équation in E 
such that E \= u = v or else Ri = R U {v —> x} is canonical. In this case, from 
E U {v = x} \= s = t, we deduce that either E \= s = t (and v = x) or there is a 
subterm u oi s,t such that E \= v = u. we can conclude that, in ail cases, there 
is a subterm u of E U {s = t} such that E \= u = v. 

To summarize, a is such that Ea \= sa = ta iff there is a subterm u of 
E U {s = t} such that È \= u = xa and E U {u = x} \= s = t. 

If we let T be the set of subterms u of E U {s = t} such that E U {u = x} \= 
s = t, then T is finite (and computable in polynomial time). The set of solutions 

is then > (T), which is a recognizable set of trees, thanks to Proposition 

A- 1 
16. 

□ 
Further comments and références are given in the bibliographie notes. 
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3.4.6 Application to Higher-order Matching 

We give hère a last application (but the list is not closed!), in the typed lambda- 
calculus. 

To be self-contained, let us first recall some basic définitions in typed lambda 
calculus. 

The set of types of the simply typed lambda calculus is the least set contain- 
ing the constant o (basic type) and such that r — ► r' is a type whenever r and 
r' are types. 

Using the so-called Curryfication, any type t ^> [t' ^ r") is written r, r' — > 
r". In this way ail non-basic types are of the form n, . . . ,T„ — ► o with intuitive 
meaning that this is the type of functions taking n arguments of respective types 
n , . . . , T n and whose resuit is a basic type o. 

The order of a type r is defined by: 

• O(o) = 1 

• 0(n,.. . ,r„ —>-o) = 1 + max{(3(r 1 ),. .. ,0(T n )} 

Given, for each type r a set of variables X T of type r and a set C r of constants 
of type r, the set of ternis (of the simply typed lambda calculus) is the least 
set A such that: 

• x G X T is a term of type r 

• c G C T is a term of type r 

• If x\ G X Tl , . . . , x n G X Tn and t is a term of type t, then Acci , . . . x n : t is 
a term of type ri, . . . , r n — > t 

• If t is a term of type ri, . . . , t„ — > r and t\, . . . ,t n are ternis of respective 
types ri, ... , r„, then t(ti, . . . , t n ) is a term of type r. 

The order of a term £ is the order of its type r(i). 

The set of free variables Var(t) of a term £ is defined by: 

• Var(x) = {x} if a; is a variable 

• Var(c) = if c is a constant 

• Var(A:Ei, . . . ,x n : t) = Var(t) \ {xi, . . . ,x n } 

• Var(t(ui,...,u n )) = Var(t)U Var(ui) U . . . U Var(u„) 

Ternis are always assumed to be in rj-long form, i.e.they are assumed to 
be in normal form with respect to the rule: 

(77) t— > Xxi,...,x n -t(xi,...,x n ) if r{t) = n,...T n -> r 

and Xi G A" Ti \ Var(i) for ail i 

We define the a-equivalence = a on A as the least congruence relation such 
that: Xxi, . . . , x n : t = a Xx^ . . . , x' n : t' when 

• t' is the term obtained from t by substituting for every index i, every free 
occurrence of Xi with x\ . 
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• There is no subterm of t in which, for some index i, both xi and x[ occur 
free. 

In the following, we consider only lambda ternis modulo a-equivalence. Then 
we may assume that, in any term, any variable is bounded at most once and 
free variables do not hâve bounded occurrences. 

The (3-reduction is defined on A as the least binary relation — > such that 

• AX\ , . . . , X n . Z\t\ , . . . j t n J > Z\X\< Z\ , . . . , X n < tuf 

• for every context C, C[t] — > C[u\ whenever t — > u 

It is well-known that /3?7-reduction is terminating and confluent on A and, 
for every term t G A, we let t j be the unique normal form of t. 

A matching problem is an équation s = t where s,t G A and Var(t) = 0. 
A solution of a matching problem is a substitution a of the free variables of t 
such that sa |= t j. 

Whether or not the matching problem is decidable is an open question at the 
time we write this book. However, it can be decided when every free variable 
occurring in s is of order less or equal to 4. We sketch hère how tree automata 
may help in this matter. We will consider only two spécial cases hère, leaving the 
gênerai case as well as détails of the proofs as exercises (see also bibliographie 
notes). 

First consider a problem 

(1) x(si,...,S n ) = t 

where a; is a third order variable and Si, . . . , s n , t are ternis without free variables. 

The first resuit states that the set of solutions is recognizable by a □- 
automaton. D-automata are a slight extension of finite tree automata: we 
assume hère that the alphabet contains a spécial symbol □. Then a term u is 
accepted by a D-automaton A if and only if there is a term v which is accepted 
(in the usual sensé) by A and such that u is obtained from v by replacing each 
occurrence of the symbol □ with a term (of appropriate type). Note that two 
distinct occurrences of □ need not to be replaced with the same term. 

We consider the automaton .À Sl ,... jS „ ) t defined by: T consists of ail symbols 
occurring in t plus the variable symbols x±, . . . , x n whose types are respectively 
the types of si, . . . , s„ and the constant □. 

The set of states Q consists of ail subterms of t, which we write q u (instead 
of u) and a state qu ■ In addition, we hâve the final state qf . 

The transition rules A consist in 

• The rules 

/feu •••>%„) -> Qj(t lt ...,t n ) 
each time q/t^,...,^) £ Q 

• For i = 1, . . . , n, the rules 

Xi{qt 1 ,---,qt n ) -> lu 

where u is a subterm of t such that Si(ti, . . . , t n ) |= u and tj = □ whenever 
Si(ti, . . . , tj-l, □, tj+l, • . • , t n ) [= u. 
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• the rule Xxi, . . . , Xx n .q t — > g/ 

Theorem 31. The set of solutions of (1) (up to a-conversion) is acceptée, by 
the O-automaton A Sl ,...,s„,t- 

More about this resuit, its proof and its extension to fourth order will be 
given in the exercises (see also bibliographie notes). Let us simply give an 
example. 

Example 36. Let us consider the interpolation équation 
x(XyiXy2.yi,Xy 3 .f(v3, 2/3)) = /(a, a) 

where 2/1,2/2 are assumed to be of base type o. Then T = {a, f, X\,X2, O }. 
Q = {q a , Qf( a ,a)i Qa a , qf} and the rules of the automaton are: 

a -> q a f(q a ,q a ) -> 9/(0,0) 

D o -> qu xi(q a ,q 0o ) -> q a 

ïl(î/(o,a))ïoj -> 9/(o,o) x 2 (g a ) -> 9/(o,o) 

\x 1 \x 2 -q f (a,a) -> 9/ 

For instance the terni Aa;iAx2.cci (a;2(xi(a;i (a, □(,), □(,)), □(,) is accepted by 
the automaton: 

AxiAa;2.a;i(a;2(xi(a;i(a, □ ),D )),n ) A Xx 1 Xx 2 .x 1 (x 2 (xi{x 1 (q a , q 0o ), qn )), qn, 

A 



A 
A 
Z\ 
Z\ 
Z\ 



Xx 1 Xx 2 .x 1 (x 2 (x 1 (q a ,qa )),qn ) 
Xx 1 Xx 2 .Xi(x 2 (q a ),qn ) 
Xx 1 \x 2 .x 1 (q f{ata) ,qn o ) 
XxiXx 2 .q f(a ^ a) 

Qf 



And indeed, for every terms t\, t 2l t 3 , XxiXx 2 .xi(x 2 (xi(xi(a, ti),t 2 )), t 3 ) is a 
solution of the interpolation problem. 



3.5 Exercises 

Exercise 31. Let T be the alphabet consisting of finitely many unary function 
symbols ai, . . . , a n and a constant 0. 

1. Show that the set S of pairs (of words) {(o?(oi(o a (of (0)))),aï(af (0))) | n,p 6 
N} is in Rec. Show that S* is not in Rec, hence that Rec is not closed under 
transitive closure. 

2. More generally, show that, for any finite rewrite System V, (on the alphabet T\), 

the réduction relation — ► is in Rec. 

iz 

3. Is there any hope to design a class of tree languages which contains Rec, which is 
closed by ail Boolean opérations and by transitive closure and for which empti- 
ness is decidable? Why? 
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Exercise 32. Show that the set of pairs {(*,/(*,*')) I M' e T(T)} is not in Rec. 

Exercise 33. Show that if a binary relation is recognized by a GTT, then its inverse 
is also recognized by a GTT. 

Exercise 34. Give an example of two relations that are recognized by GTTs and 
whose union is not recognized by any GTT. 

Similarly, show that the class of relations recognized by a GTT is not closed by 
complément. Is the class closed by intersection? 

Exercise 35. Give an example of a n-ary relation such that its iih projection followed 
by its ith cylindrification does not give back the original relation. On the contrary, 
show that ith cylindrification followed by ith projection gives back the original relation. 

Exercise 36. About Rec and bounded delay relations. We assume that T only 
contains unary function symbols and a constant, i.e.we consider words rather than 
trees and we write u — ai . . . a„ instead of m = ai(. . . (a n (0)) . . .). Similarly, u ■ v 
corresponds to the usual concaténation of words. 

A binary relation R on T(JF) is called a bounded delay relation if and only if 

3fc/V(w,v) 6 R, | \u\ -\v\\<k 
R préserves the length if and only if 

V(u,t>) £ R, \u\ = \v\ 
If A, B are two binary relations, we write A ■ B (or simply AB) the relation 

A ■ B — {(u,v)/3(ui,vi) £ A, (u2,v 2 ) £ Bu = ui.u 2 , v = vi.1/3} 
Similarly, we write (in this exercise!) 

A* — {(u, v)/3(ui, vi) £ A, . . . , (u n , v n ) £ A,u = ui . . . u n , v = vi . . . v n } 

1. Given A, B € Rec, is A ■ B necessary in Rec? is A* necessary in Recl Why? 

2. Show that if A £ Rec préserves the length, then A* £ Rec. 

3. Show that if A, S £ Rec and A is of bounded delay, then A • B £ Rec. 

4. The family of rational relations is the smallest set of subsets of T(J-) which con- 
tains the finite subsets of T(J-) and which is closed under union, concaténation 
(•) and *. 

Show that, if A is a bounded delay rational relation, then A £ Rec. Is the 
converse true? 

Exercise 37. Let _Ro be the rewrite System {ixû — » 0; Oxx — > 0} and T = {0, 1, s, x} 

1. Construct explicitly the GTT accepting > . 

2. Let Ri — Rq U {x x 1 — * x}. Show that > is is not recognized by a GTT. 

Ri 

3. Let R2 — Ri U {1 X x — * x X 1}. Using a construction similar to the transitive 
closure of GTTs, show that the set {t £ T{T) \ 3u £ T(F),t -% u,u £ NF} 

where NF is the set of terms in normal form for R 2 is recognized by a finite 
tree automaton. 
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Exercise 38. (*) More generally, prove that given any rewrite System {h — > n | 1 < 
i < n} such that 

1. for ail i, li and ri are linear 

2. for ail i, if x £ Var(7i) n Var(ri), then a; occurs at depth at most one in U. 
the set {t £ T(T) \ 3u £ NF, t — » u} is recognized by finite tree automaton. 

R 

What are the conséquences of this resuit? 

(See [Jac96] for détails about this results and its applications. Also compare with 
Exercise 16, question 4.) 

Exercise 39. Show that the set of pairs {(f(t,t'),t) \ t, t' 6 T{T)} is not definable 
in WSkS. (See also Exercise 32) 

Exercise 40. Show that the set of pairs of words {(w,w r ) \ l(w) — l(w')}, where l(x) 
is the length of x, is not definable in WSkS. 

Exercise 41. Let T = {ai, . . . ,a n ,0} where each ai is unary and is a constant. 
Consider the following constraint System: terms are built on T, the binary symbols 
0, U, the unary symbol -i and set variables. Formulas are conjunctions of inclusion 
constraints t Ç t . The formulas are interpreted by assigning to variables finite subsets 
of T{!F), with the expected meaning for other symbols. 

Show that the set of solutions of such constraints is in Rec2 ■ What can we conclude? 

(*) What happens if we remove the condition on the cn's to be unary? 

Exercise 42. Complète the proof of Proposition 13. 

Exercise 43. Show that the subterm relation is not definable in WSkS. 

Given a term t Write a WSkS formula <f>t such that a term u \= <j>t if and only if t 
is a subterm of u. 

Exercise 44. Define in SkS "X is finite". (Hint: express that every totally ordered 
subset of X has an upper bound and use Kônig's lemma) 

Exercise 45. A tuple (ti, . . ■ ,t n ) £ T(^ r ) n can be represented in several ways as 
a finite séquence of finite sets. The first one is the encoding given in Section 3.3.6, 
overlapping the terms and considering one set for each tuple of symbols. A second one 
consists in having a tuple of sets for each component: one for each function symbol. 

Compare the number of free variables which resuit from both codings when defining 
an n-ary relation on terms in WSkS. Compare also the définitions of the diagonal A 
using both encodings. How is it possible to translate an encoding into the other one? 

Exercise 46. (*) Let TL be a finite rewrite System whose ail left and right members 
are ground. 

1. Let Termination(x) be the predicate on T(JF) which holds true on t when there 
is no infinité séquence of réductions starting from t. Show that adding this 
predicate as an atomic formula in the first-order theory of rewriting, this theory 
remains decidable for ground rewrite Systems. 

2. Generalize thèse results to the case where the left members of 1Z are linear and 
the right members are ground. 

Exercise 47. The complexity of automata recognizing the set of irreducible ground 
terms. 

1. For each n £ N, give a linear rewrite System lZ n whose size is 0{n) and such 
that the minimal automaton accepting the set of irreducible ground terms has 
a size 0(2 n ). 
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2. Assume that for any two strict subterms s, t of left hand side(s) of 1Z, if s and t 
are unifiable, then s is an instance of t or t is an instance of s. Show that there 
is a NFTA A whose size is linear in 1Z and which accepts the set of irreducible 
ground terms. 

Exercise 48. Prove Theorem 29. 

Exercise 49. The Propositional Linear-time Temporal Logic. The logic PTL 
is defined as follows: 

Syntax P is a finite set of propositional variables. Each symbol of P is a formula (an 
atomic formula). If and ip are formulas, then the following are formulas: 

<j>/\ip, ^Vi/>, <P^ip, ~«i>, 4>Vip, N0, L</> 

Semantics Let P* be the set of words over the alphabet P. A word w £ P* is 
identified with the séquence of letters w(0)w(l) . . .w(\w\ — 1). w(i..j) is the 
word w(i) . . . w(j). The satisfaction relation is defined by: 

• if p £ P, w \= p if and only if w(0) = p 

• The interprétation of logical connectives is the usual one 

• w \= N0 if and only if |w| > 2 and w(l..|w| — 1) \= (p 

• w \= L</> if and only if |w| = 1 

• w \= (fiTJij) if and only if there is an index i G [0..|it>|] such that for ail 
j £ [0..i], w(j..\w\ — 1) \= (/> and w(i..\w\ — 1) |= i/j. 

Let us recall that the language defined by a formula cj> is the set of words w such 
that w (= (/>. 

1. What it is the language defined by N(piUp2) (with pi,p2 £ P)? 

2. Give PTL formulas defining respectively P*p\P* , pi, (P1P2)* • 

3. Give a first-order WS1S formula (i.e.without second-order quantification and 
containing only one free second-order variable) which defines the same language 
as N(piUp 2 ) 

4. For any PTL formula, give a first-order WS1S formula which defines the same 
language. 

5. Conversely, show that any language defined by a first-order WS1S formula is 
definable by a PTL formula. 

Exercise 50. About 3rd-order interpolation problems 

1. Prove Theorem 31. 

2. Show that the size of the automaton A si ,...,s„,t is 0(n x \t\) 

3. Deduce from Exercise 19 that the existence of a solution to a System of interpo- 
lation équations of the form x(si, . . . , s n ) = t (where a; is a third order variable 
in each équation) is in NP. 

Exercise 51. About gênerai third order matching. 

1. How is it possible to modify the construction of A si ,...,s n ,t so as to forbid some 
symbols of t to occur in the solutions? 
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2. Consider a third order matching problem u — t where t is in normal form and 
does not contain any free variable. Let xi, . . . ,x n be the free variables of u and 
Xj(si, • . ■ , s m ) be the subterm of u at position p. Show that, for every solution 
a, either u[n] p er |= Q t or else that Xiix(si<7, . . . , s m a) J, is in the set S p defined 
as follows: v € S p if and only if there is a subterm t' of t and there are positions 
pi, . . . ,Pk of t' and variables zi , . . . , zu which are bound above p in u such that 
v = t'[z 1 ,...,z k ] pl ,..., Pk . 

3. By guessing the results of Xi<j{s\<7, . . . ,s m er) and using the previous exercise, 
show that gênerai third order matching is in NP. 



3.6 Bibliographie Notes 

The following bibliographie notes only concern the applications of the usual 
finit e tree automata on finite trees (as defined at this stage of the book). We 
are pretty sure that there are many missing références and we are pleased to 
receive more pointers to the littérature. 

3.6.1 GTT 

GTT were introduced in [DTHL87] where they were used for the decidability of 
confluence of ground rewrite Systems. 

3.6.2 Automata and Logic 

The development of automata in relation with logic and vérification (in the 
sixties) is reported in [Tra95]. This research program was explained by A. 
Church himself in 1962 [Chu62]. 

Milestones of the decidability of monadic second-order logic are the papers 
[Bùc60] [Rab69]. Theorem 25 is proved in [TW68]. 

3.6.3 Surveys 

There are numerous surveys on automata and logic. Let us mention some of 
them: M.O. Rabin [Rab77] surveys the decidable théories; W. Thomas [Tho90, 
Tho97] provides an excellent survey of relationships between automata and logic. 

3.6.4 Applications of tree automata to constraint solving 

Concerning applications of tree automata, the reader is also referred to [Dau94] 
which reviews a number of applications of Tree automata to rewriting and con- 
straints. 

The relation between sorts and tree automata is pointed out in [Com89]. The 
decidability of arbitrary first-order sort constraints (and actually the first order 
theory of finite trees with equality and sort constraints) is proved in [CD94]. 

More gênerai sort constraints involving some second-order terms are studied 
in [Com98b] with applications to a sort constrained equational logic [Com98a] . 

Sort constraints are also applied to spécifications and automated inductive 
proofs in [BJ97] where tree automata are used to represent some normal forms 
sets. They are used in logic programming and automated reasoning [FSVY91, 
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GMW97], in order to get more efficient procédures for fragments which fall 
into the scope of tree automata techniques. They are also used in automated 
déduction in order to increase the expressivity of (counter-)model constructions 
[Pel97]. 

Concerning encompassment, M. Dauchet et al gave a more gênerai resuit 
(dropping the linearity requirement) in [DCC95]. We will corne back to this 
resuit in the next chapter. 

3.6.5 Application of tree automata to semantic unification 

Rigid unification was originally considered by J. Gallier et al. [GRS87] who 
showed that this is a key notion in extending the matings method to a logic 
with equality. Several authors worked on this problem and it is out of the scope 
of this book to give a list of références. Let us simply mention that the resuit 
of Section 3.4.5 can be found in [Vea97b]. Further results on application of tree 
automata to rigid unification can be found in [DGN + 98], [GJV98]. 

Tree automata are also used in solving classical semantic unification prob- 
lems. See e.g. [LM93] [KFK97] [IM92]. For instance, in [KFK97], the idea is 
to capture some loops in the narrowing procédure using tree automata. 

3.6.6 Application of tree automata to décision problems 
in term rewriting 

Some of the applications of tree automata to term rewriting follow from the 
results on encompassment theory. Early works in this area are also mentioned 
in the bibliographie notes of Chapter 1 . The reader is also referred to the survey 
[GT95]. 

The first-order theory of the binary (many-steps) réduction relation w.r.t.a 
ground rewrite System has been shown decidable by. M. Dauchet and S. Ti- 
son [DT90]. Extensions of the theory, including some function symbols, or 
other predicate symbols like the parallel rewriting or the termination predicate 
(Terminate(i) holds if there is no infinité réduction séquence starting from t), 
or fair termination etc. remain decidable [Tis89]. See also the exercises. 

Both the theory of one step and the theory of many steps rewriting are 
undecidable for arbitrary 1Z [Tre96] . 

Réduction stratégies for term rewriting hâve been first studied by Huet and 
Lévy in 1978 [HL91]. They show hère the decidability of strong sequential- 
ity for orthogonal rewrite Systems. This is based on an approximation of the 
rewrite System which, roughly, only considers the left sides of the rules. Bet- 
ter approximation, yielding refined criteria were further proposed in [Oya93], 
[Com95], [Jac96]. The orthogonality requirement has also been replaced with 
the weaker condition of left linearity. The first relation between tree automata, 
WSkS and réduction stratégies is pointed out in [Com95]. Further studies of 
call-by-need stratégies, which are still based on tree automata, but do not use 
a détour through monadic second-order logic can be found in [DM97]. For ail 
thèse works, a key property is the préservation of regularity by (many-steps) 
rewriting, which was shown for ground Systems in [Bra69], for linear Systems 
which do not share variables in [DT90] , for shallow Systems in [Com95] , for right 
linear monadic rewrite Systems [Sal88] , for linear semi- monadic rewrite Systems 
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[CG90], also called (with slight différences) growing Systems in [Jac96]. Grow- 
ing Systems are the currently most gênerai class for which the préservation of 
recognizability is known. 

As already pointed out, the decidability of the encompassment theory implies 
the decidability of ground reducibility. There are several papers written along 
thèse lines which will be explained in the next chapter. 

Finally, approximations of the reachable ternis are computed in [Gen97] 
using tree automata techniques, which implies the décision of some safety prop- 
erties. 

3.6.7 Other applications 

The relationship between finite tree automata and higher-order matching is 
studied in [CJ97b]. 

Finite tree automata are also used in logic programming [FSVY91], type 
reconstruction [Tiu92] and automated déduction [GMW97]. 

For further applications of tree automata in the direction of program vérifi- 
cation, see e.g. chapter 5 of this book or e.g. [Jon87]. 
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Chapter 4 

Automata with Constraints 



4.1 Introduction 

A typical example of a language which is not recognized by a finite tree au- 
tomaton is the set of terms {/(£,£) \ t G T(T)}. The reason is that the two 
sons of the root are recognized independently and only a fixed finite amount of 
information can be carried up to the root position, whereas t may be arbitrar- 
ily large. Therefore, as seen in the application section of the previous chapter, 
this imposes some linearity conditions, typically when automata techniques are 
applied to rewrite Systems or to sort constraints. The shift from linear to non 
linear situations can also be seen as a generalization from tree automata to DAG 
(directed acyclic graphs) automata. This is the purpose of the présent chapter: 
how is it possible to extend the définitions of tree automata in order to carry 
over the applications of the previous chapter to (some) non-linear situations? 

Such an extension has been studied in the early 80's by M. Dauchet and J. 
Mongy. They define a class of automata which (when working in a top-down 
fashion) allow duplications. Considering bottom-up automata, this amounts to 
check equalities between subtrees. This yields the RATEG class . This class 
is not closed under complément. If we consider its closure, we get the class 
of automata with equality and disequality constraints. This class is studied in 
Section 4.2. 

Unfortunately, the emptiness problem is undecidable for the class RATEG 
(and hence for automata with equality and disequality constraints). 
Several decidable subclasses hâve been studied in the literature. The most 
remarkable ones are 

• The class of automata with constraints between brothers which, roughly, 
allows equality (or disequality) tests only between positions with the same 
ancestors. For instance, the set of terms f(t,t) is recognized by such 
an automaton. This class is interesting because ail properties of tree 
automata carry over this extension and hence most of the applications of 
tree automata can be extended, replacing linearity conditions with such 
restrictions on non-linearities. 

We study this class in Section 4.3. 

• The class of réduction automata which, roughly, allows arbitrary disequal- 
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ity constraints but only a fixed finite amount of equality constraints on 
each run of the automaton. For instance the set of terms f(t,t) also be- 
longs to this class. Though closure properties hâve to be handled with 
care (with the définition sketched above, the class is not closed by com- 
plément), réduction automata are interesting because for example the set 
of irreducible terms (w.r.t.&n arbitrary, possibly non-linear rewrite Sys- 
tem) is recognized by an réduction automaton. Then the decidability of 
ground reducibility is a direct conséquence of emptiness decidability for 
réduction automata. There is also a logical counterpart: the reducibility 
theory which is presented in the linear case in the previous chapter and 
which can be shown decidable in the gênerai case using a similar technique. 

Réduction automata are studied in Section 4.4. 

We also consider in this chapter automata with arithmetic constraints. They 
naturally appear when some function symbols are assumed to be associative and 
commutative (AC). In such a situation, the sons of an AC symbol can be per- 
muted and the relevant information is then the number of occurrences of the 
same subtree in the multisets of sons. Thèse integer variables (number of occur- 
rences) are subject to arithmetic constraints which must belong to a decidable 
fragment of arithmetic in order to keep closure and decidability properties. 

4.2 Automata with Equality and Disequality Con- 
straints 

4.2.1 The Most General Class 

An equality constraint (resp. a disequality constraint) is a predicate on 
TÇF) written ir = n' (resp. ir ^ n') where n, n' G {1, . . . , k}* . Such a predicate 
is satisfied on a terni t, which we write t \= ir = n' , if 7r, 7r' G Vos(t) and t| w = t\„i 
(resp. 7r =£ 7r' is satisfied on t if tt = n' is not satisfied on t). 

The satisfaction relation (= is extended as usual to any Boolean combination 
of equality and disequality constraints. The empty conjunction and disjunction 
are respectively written _L (false) and T (true). 

An automaton with equality and disequality constraints is a tuple 
(Q, J- ', Qf, A) where T is a finite ranked alphabet, Q is a finite set of states, Q / 
is a subset of Q of finite states and A is a set of transition rules of of the form 

f(qi,...,q n ) -^ q 

where / G T, qi, . . . ,q n ,q G Q, and c is a Boolean combination of equality 
(and disequality) constraints. The state q is called target state in the above 
transition rule. 

We write for short AWEDC the class of automata with equality and dise- 
quality constraints. 

Let A = (Q, T, Qf, A) G AWEDC. The move relation — >a is defined by as 
for NFTA modulo the satisfaction of equality and disequality constraints: let 
t, t' G F(T U Q, A), then t -^ A t' if and only 

there is a context C G C(T U Q) and some terms u\, . . . ,u n G T{T) 
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there exists /(gi 



g G A 



t = C[/(gi(ui),. .. , g„(u„)] and t' = C[q(f(ui,... ,«„))] 

C[/(«i, ...,«„)] h c 

— >_4 is the reflexive and transitive closure of —►,4,. As in Chapter 1, we usually 
write t — >a q instead of t — >_4 q(t). 

An automaton A G AWEDC accepts (or recognizes) a ground terni 
t G T{JF) if £ — m g for some state g G Q/. More generally, we also say that 
A accepts t in state q iff t — >^ g (acceptance by A is the particular case of 
acceptance by A in a final state). 

A run) is a mapping p from Vos(t) into A such that: 

• P(A) G Q f 

• if i(p) = / and the target state of p(p) is g, then there is a transition rule 
/(gi, . . . , g„) — > g in A such that for ail 1 < i < n, the target state of 
p{pi) is qi and i| p \= c. 

Note that we do not hâve hère exactly the same définition of a run as in 
Chapter 1: instead of the state, we keep also the rule which yielded this state. 
This will be useful in the design of an emptiness décision algorithm for non- 
deterministic automata with equality and disequality constraints. 

The language acceptée (or recognized) by an automaton A G AWEDC 
is the set L{A) of ternis t G T(T) that are accepted by A. 

Example 37. Balanced complète binary trees over the alphabet / (binary) 
and a (constant) are recognized by the AWEDC ({g}, {/, a}, {g}, A) where A 
consists of the following rules: 



ri : 
T2 : 



a ^> q 

f(q,q) ^^ q 



For example, t = f(f(a J a),f(a 1 a)) is accepted. The mapping which associâtes 
ri to every position p of t such that t(p) = a and which associâtes ri to every 
position p oit such that t(p) = / is indeed a successful run: for every position 
p oî t such that t(p) = /, t\ p .i = t p .2, hence t\ p (= 1 = 2. 



Example 38. Consider the following AWEDC: (Q,F,Q f ,A) with T = 
{0, s, /} where is a constant, s is unary and / has arity 4, Q = {g r i,go,g/}> 
Qf = {qf}, and A consists of the following rules: 




s (q n ) 

/(go, go, go, go) 
/(g/,gn,g™,g n ) 



—> 


9o 


s (go) 


-» 


qn 


/(go,go,g«,gn) 


-» 


q/ 


/(go,g n ,go,g.i) 


14=4A21=12A131=3 


n c 





2=4 



g» 

9/ 
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Figure 4.1: A computation of the sum of two natural numbers 



This automaton computes the sum of two natural numbers written in base 
one in the following sensé: if t is acceptée! by A then 1 t = f(ti, s"(0), s m (0), s n+m (0)) 
for some t\ and n,m > 0. Conversely, for each n,m > 0, there is a term 
/(ti,s"(0),s m (0),s n+m (0)) which is accepted by the automaton. 

For instance the term depicted on Figure 4.1 is accepted by the automaton. 
Similarly, it is possible to design an automaton of the class AWEDC which 
"computes the multiplication" (see exercises) 

In order to evaluate the complexity of opérations on automata of the class 
AWEDC, we need to précise a représentation of the automata and estimate the 
space which is necessary for this représentation. 

The size of is a Boolean combination of equality and disequality constraints 
is defined by induction: 

• ||tt = tt'H = f ||tt ^ tt'H d = |tt| + W\ (H is the length of n) 
. || C A C '|| d = f || C Vc'|| d = f ||c|| + || C '|| + l 

• IM| d ^||c|| 

Now, deciding whether t \= c dépends on the représentation of t. If t is 
represented as a directed acyclic graph (a DAG) with maximal sharing, then this 
can be decided in 0(||c||) on a RAM. Otherwise, it requires to compute first this 
représentation of t, and hence can be computed in time at most 0(||i|| log ||£|| + 

NI). 



l (0) dénotes s(. . . s(0) 
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/From now on, we assume, for complexity analysis, that the ternis are rep- 
résentée! with maximal sharing in such a way that checking an equality or a 
disequality constraint on t can be complétée! in a time which is independent of 

11*11- 

The size of an automaton A G AWEDC is 

||.4|| d =! f |Q|+ ]T n + 2+||c|| 

f(qi,...,q n ) -^ «GA 



An automaton A in AWEDC is deterministic if for every t G T(T), there 
is at most one state q such that t — ► q. It is complète if for every t G T(F) 

there is at least one state q such that t — ► q. 

A 
When every constraint is a tautology, then our définition of automata re- 
duces to the définition of Chapter 1. However, in such a case, the notions of 
determinacy do not fully coincide, as noticed in Chapter 1, page 21. 

Proposition 18. Given t G T(F) and A ^AWEDC, deciding whether t is ac- 
cepted by A can be completed in polynomial time (linear time for a deterministic 
automaton) . 

Proof. Because of the DAG représentation of £, the satisfaction of a constraint 
7r = n' on t can be completed in time 0( |tt | + |7r'|). Thus, if A is determinis- 
tic, the membership test can be performed in time 0(||£| + ||-4|| + MC) where 
MC = max(||c|| c is a constraint of a rule of A). If A is nondeterministic, the 
complexity of the algorithm will be 0(\\t\\ x ||„4|| x MC). □ 

4.2.2 Reducing Non-determinism and Closure Properties 

Proposition 19. For every automaton A G AWEDC, there is a complète au- 
tomaton A' which accepts the same language as A. The size \\A'\\ is polynomial 
in \\A\\ and the computation of A' can be performed in polynomial time (for a 
fixed alphabet). If A is deterministic, then A' is deterministic. 

Proof. The proof is the same as for Theorem 2: we add a trash state and 
every transition is possible to the trash state. However, this does not keep the 
determinism of the automaton. We need the following more careful computation 
in order to préserve the determinism. 

We also add a single trash state q^ . The additional transitions are computed 
as follows: for each function symbol / G T and each tuple of states (including the 
trash state) q%, . . . , q n , if there is no transition f(qi,---,q n ) ~ > Q G A, then we 

simply add the rule /(ci, . . . , q n ) —> Q± to A. Otherwise, let f(qi, . . . , q n ) — > Si 
(i = 1, ..m) be ail rules in A whose left member is f(qi, ■ ■ ■ , q n )- We add the 

m 

rule f(qx, ...,q n ) -^-> q± to A, where c' = -i y a. □ 

i=i 

Proposition 20. For every automaton A G AWEDC, there is a deterministic 
automaton A' which accepts the same language as A. A' can be computed in 
exponential time and its size is exponential in the size of A. Moreover, if A is 
complète, then A' is complète. 



TATA — September 6, 2005 



116 Automata with Constraints 

Proof. The construction is the same as in Theorem 4: states of A' are sets of 
states of A. Final states of A' are those which contain at least one final state 
of A. The construction time complexity as well as the size A' are also of the 
same magnitude as in Theorem 4. The only différence is the computation of the 
constraint: if Si, . . . , S n , S are sets of states, in the deterministic automaton, 
the rule /(Si, . . . , S n ) — > S is labeled by a constraint c defined by: 

c=(a V %) A (A A -*•) 

?SS /(<?!,.. .,?„) -^ ?£A 4? S f( 9l q n ) ^+ «SA 

qiÇiSii<n qiÇiSii<n 

Let us prove that t is accepted by A in states q±, . . . , qk (and no other states) if 
and only if there t is accepted by A' in the state {gi, ...,%}: 

=>• Assume that t — ► qi (i.e.t — > qi in n steps), for i = 1, . . . , k. We prove, by 
A A 

induction on n, that 

* -^ {<7i, ■•-,%}■ 
If n = 1, then i is a constant and t — > S is a rule of .A' where S = 

{g I « -+ q}- 

Assume now that n > 1. Let, for each i = 1, . . . , fc, 

t = f{h,...,t p ) -J /(gj,...,g£) -^ ft 

and /(gî, • . • , gi) — ^ 5; be a rule of A such that t \= Ci. By induction 
hypothesis, each term tj is accepted by A' in the states of a set Sj D 
{g*, . . . , gH. Moreover, by définition of S = {gi, . . . , gj,}, if i — ► q' then 

q' G S. Therefore, for every transition rule of A f(q'i, ■ ■ ■ , gl) — ► q' such 

that g' ^ S and g^ G Sj for every j < p, we hâve t [£ c! . Then t satisfies 
the above defined constraint c. 

<= Assume that t — ► S. We prove by induction on n that, for every g G S, 
.4' 

If n = 1, then S is the set of states g such that i — ► g, hence the property. 

A 

Assume now that 

t = f(ti, ■ ■ ■ ,t p ) —> /(Si, . . . , S p ) — > S. 

Let /(Si, ...,Sp) — * S be the last rule used in this réduction. Then 
t \= c and, by définition of c, for every state g G S, there is a rule 
/(gi, . . . , q n ) —^ g G A such that g^ G Si for every i < n and t \= c r . By 

induction hypothesis, for each i, ti — '-* Si implies t{ — '-y qi (m, < n) 

.A' A 

and hence t — > /(gi, . . . , q p ) — ► g. 
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Thus, by construction of the final states set, a ground terni t is acceptée! by A' 

iff t is acceptée! by A. 

Now, we hâve to prove that A' is deterministic indeed. Assume that t — > 'S 

A 

and t — > S'. Assume moreover that S =£ S' and that t is the smallest terni (in 

A' 
size) with the property of being recognized in two différent states. Then there ex- 

ists Si, . . . , S„ such that t — > /(Si, . . . , S n ) and such that /(Si, . . . , S„) — ► S 

A' 
and /(Si,...,S„) — > S' are transition rules of A', wit t |= c and £ (= c'. 

By symmetry, we may assume that there is a state q G S such that q £ S' . 
Then, by définition, there are some states qi G Si, for every i < n, and a rule 
f{qi, • ■ • , ç n ) — ^ Q of A where c r occurs positively in c, and is therefore satisfied 

by t, t (= c r . By construction of the constraint of .A', c r must occur negatively in 
the second part of (the conjunction) c! . Therefore, t \= c' contradicts t \= c r . □ 



Example 39. Consider the following automaton on the alphabet T = {a, /} 
where a is a constant and / is a binary symbol: Q = {q,q±}, Qf = {q} and A 
contains the following rules: 

1=2 

a^q f(q,q) >q f{q,q)^qx 

f(qx,q)^qx f(q,qx)-^qx f{qx,qx) ->• qx 

This is the (non-deterministic) complète version of the automaton of Exam- 
ple 37. 

Then the deterministic automaton computed as in the previous proposition 
is given by: 

«- M f({q},{q}) ^^ {«} 

/({«},{«}) — {q,qx} f({q},{q}) ^ {qx} 

f({q},{qx})^{qx} f({qx},{q})^{qx} 

f(Ux}, {qx}) - {qx} f({q, qx}, {q}) ^^ {q} 

f({q,qx},{q}) — ^-> {q,qx} f({q,qx},{qx}) -> {qx} 
f({q,qx},{q,qx}) -— > {q,qx} f({q,qx},{q}) — > {qx} 

f({q},{q,q ± }) ^ {q iq± } f({q}, {q, q X }) ^ {q ± } 

f({q, q ±},{q}) ^ {qx} f({q},{q,qx}) ^^ {q} 
f{{q,qx},{q>qx}) — — > {q} /({<7±},{<?,ç±}) -> {qx} 

For instance, the constraint i=2AJ- is obtained by the conjunction of the label 

1=2 

of f(q,q) ► q and the négation of the constraint labelling f(q,q) — > qx, 

(which is T). 

Some of the constraints, such as 1=2A± are unsatisfiable, hence the corre- 
sponding rules can be removed. If we finally rename the two accepting states 
{q} and {q,q±} into a single state qj (this is possible since by replacing one 
of thèse states by the other in any left hand side of a transition rule, we get 



TATA — September 6, 2005 



118 Automata with Constraints 

another transition rule), then we get a simplifiée! version of the deterministic 
automaton: 

a -><?/ f(Qf,Qf) — ^* If 

/(?/>?/) " 9-L f(Q±,Qf)->Qx 

f(<if,Q±)-^q± f{q±,q±)->q± 



Proposition 21. The class AWEDC is effectively closed by ail Boolean op- 
érations. Union requires linear time, intersection requires quadratic time and 
complément requires exponential time. The respective sizes of the AWEDC ob- 
tained by thèse construction are of the same magnitude as the time complexity. 

Proof. The proof of this proposition can be obtained from the proof of The- 
orem 5 (Chapter 1, pages 28-29) with straightforward modifications. The 
only différence lies in the product automaton for the intersection: we hâve to 
consider conjunctions of constraints. More precisely, if we hâve two AWEDC 
Ai = (Qi, T, Q/i,Ai) and A\ = {Q 2 ,T-, Qf 2 , A 2 ), we construct an AWEDC 
A = (Qi x Q 2 ,F,Qfi x Q/2, A) such that if f{qi,...,q n ) A g E Ai and 

f(q[,...,q' n ) ^ g'e A 2 , thenf((q 1 ,q' 1 ),...,(q n ,q' n )) ^ (q, q') G A. The 
AWEDC A recognizes L(Ai) n L{A 2 ). □ 



4.2.3 Undecidability of Emptiness 

Theorem 32. The emptiness problem for AWEDC is undecidable. 

Proof. We reduce the Post Correspondence Problem (PCP). If Wi, . . . , w n and 
w'i, . . . , w' n are the word séquences of the PCP problem over the alphabet {a, 6}, 
we let T contain h (ternary), a, b (unary) and (constant). Lets recall that the 
answer for the above instance of the PCP is a séquence i\, . . . , i p (which may 
contain some répétitions) such that Wi t . . .u>i = w[ . . . w[ . 

If w G {a, b}* , w = ai . . . ak and t G T(T), we write w(t) the terni oi(. . . (a^(i)) . . .) G 
T(T). 

Now, we construct A = (Q, J 7 , Qf, A) G AWEDC as follows: 

• Q contains a state q v for each prefix v of one of the words Wi , w^ (including 
q Wi and q w >, as well as 3 extra states: qo, q and g/. We assume that a and 
b are both prefix of at least one of the words Wi, w^. Qf = {qf}. 

• A contains the following rules: 



a(qo) - 


-> q a 


Kqo) -> qb 


a(q v ) - 


-> q a -v 


if q v ,q a -v G Q 


b(q v ) - 


-* qb-v 


iïq v ,qbv e Q 


a{q Wi ) ~ 


-> q a 


Hq-Wi) -> qb 


a(q w >) ~ 


-> q a 


b(qw.) -> qb 
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A also contains the rules: 



M<7o,<7o,9o) 

h{q Wi ,q,q w >) 
h(q Wi , q, q w [ ) 

The rule with left member h(qo,qo,qo) recognizes the beginning of a Post 
séquence. The rules with left members h(q Wi , q.q w >) ensure that we are really in 
présence of a successor in the PCP séquences: the constraint expresses that the 
subterm at position 1 is obtained by concatenating some w, with the term at 
position 2 • 1 and that the subterm at position 3 is obtained by concatenating 
w[ (with the same index i) with the subterm at position 2 • 3. Finally, entering 
the final state is subject to the additional constraint 1 = 3. This last constraint 
expresses that we went thru two identical words with the Wi séquences and the 
w[ séquences respectively. (See Figure 4.2). 

The détails that this automaton indeed accepts the solutions of the PCP are 
left to the reader. 

Then the language accepted by A is empty if and only if the PCP has a 
solution. Since PCP is undecidable, emptiness of A is also undecidable. □ 

4.3 Automata with Constraints Between Broth- 
ers 

The undecidability resuit of the previous section led to look for subclasses which 
hâve the desired closure properties, contain (properly) the classical tree au- 
tomata and still keep the decidability of emptiness. This is the purpose of the 
class AWCBB: 

An automaton A G AWEDC is an automaton with constraints be- 
tween brothers if every equality (resp disequality) constraint has the form 
i = j (resp. i ^ j) where i,j G N+. 

AWCBB is the set automata with constraints between brothers. 



Example 40. The set of terms {/(£, i) \ t € T(T)} is accepted by an automaton 
of the class AWCBB, because the automaton of Example 37 is in AWCBB 
indeed. 



4.3.1 Closure Properties 

Proposition 22. AWCBB is a stable subclass of AWEDC w.r.t.Boolean opér- 
ations (union, intersection, complémentation). 

Proof. It is sufficient to check that the constructions of Proposition 21 préserve 
the property of being a member of AWCBB. □ 
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Figure 4.2: An automaton in AWEDC accepting the solutions of PCP 
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Recall that the time complexity of each such construction is the same in 
AWEDC and in the unconstrained case: union and intersection are polynomial, 
complémentation requires determinization and is exponential. 

4.3.2 Emptiness Décision 

To décide emptiness we would like to design for instance a "cleaning algorithm" 
as in Theorem 11. As in this resuit, the correctness and completeness of the 
marking technique relies on a pumping lemma. Is there an analog of Lemma 1 
in the case of automata of the class AWCBB? 

There are additional difficulties. For instance consider the following example. 



Example 41. A contains only one state and the rules 

a -> q f(q,q) — > q 

b -> q 

Now consider the terni /(/(a, b), b) which is accepted by the automaton, /(a, b) 
and b yield the same state q. Hence, for a classical finite tree automaton, we 
may replace /(a, b) with b and still get a terni which is accepted by A. This is 
not the case hère since, replacing f(a,b) with b we get the term /(&, b) which 
is not accepted. The reason of this phenomenon is easy to understand: some 
constraint which was satisfied before the pumping is no longer valid after the 
pumping. 

Hence the problem is to préserve the satisfaction of constraints along term 
replacements. First, concerning equality constraints, we may see the terms as 
DAGs in which each pair of subternis which is checked for equality is considered 
as a single subterni referenced in two différent ways. Then replacing one of 
its occurrences automatically replaces the other occurrences and préserves the 
equality constraints. This is what is formalized below. 

Preserving the equality constraints. Let t be a term accepted by the 
automaton A in AWCBB. Let p be a run of the automaton on t. With ev- 
ery position p of t, we associate the conjunction cons(p) of atomic (equal- 
ity or disequality) constraints that are checked by p{p) and satisfied by t. 

More precisely: let p(p) = f(qi,---,q n ) — > q] cons(p) = decomp(c' ,p) where 

decomp(c' , p) is recursively defined by: decomp(T,p) = T, decomp(ciAc2,p) = 
decomp(ci,p) Adecomp(c2,p) and decomp{c\\l C2,p) = decomp(ci,p) iît\ p \= c\, 
decomp(ci V C2,p) = decomp(c2,p) otherwise. We can show by a simple induc- 
tion that t\ p \= cons(p). 

Now, we define the équivalence relation = t on the set of positions of t as the 
least équivalence relation such that: 

• if i = j € cons(p), then p ■ i = t p ■ j 

• if p =t p' and p ■ tt G Vos(t), then p ■ it = t p' ■ ir 
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Note that in the last case, we hâve p' ■ tt g Vos(t). Of course, if p =t p' , then 
t\p = t\p' (but the converse is not necessarily true). Note also (and this is a 
property of the class AWCBB) that p = t p' implies that the lengths of p and p' 
are the same, hence, if p ^ p' , they are incomparable w.r.i.the prefix ordering. 
We can also dérive from this remark that each équivalence class is finite (we 
may assume that each equality constraint of the form i = i has been replaced 
byT). 

Example 42. Consider the automaton whose transition rules are: 



ri 

r3 

r5 
ri 



f(q,q) -> q r2: a -> q 

f(q,q) ^^ If »~4: b -> q 

f(q,qf) -> qf ?"6 : f{q f ,q) -> q f 

f(q,Qf) -> q r8 : /(?/)«) -> q 



Let t = f(b, /(/(/(a, a),f(a, &)), /(/(a, a), /(a, 6)))). A possible run of A on t is 
r5(r4, r3(rl(rl(r2, r2), rl(r2, r5)), r8(r3(r2, r2), rl(r2, r5)))) Equivalence classes 
of positions are: 

{A}, {1}, {2}, {21, 22}, {211, 221}, {212, 222}, 
{2111, 2211, 2112, 2212}, {2121, 2221}, {2122, 2222} 



Let us recall the principle of pumping, for finite bottom-up tree automata 
(see Chapter 1). When a ground term C[C"[i]] (C and C" are two contexts) 
is such that t and C'[t] are accepted in the same state by a NFTA A., then 
every term C[C [t]] (n > 0) is accepted by A in the same state as C[C"[£]]. In 
other words, any C[C [t]] e L(A) may be reduced by pumping it up to the 
term C[t] G L(A). We consider hère a position p (corresponding to the term 
C"[i]) and its équivalence class [p] modulo = t . The simultaneous replacement 
on [p] with t in u, written w[i]j p j, is defined as the term obtained by successively 
rcplacing the subterm at position p' with t for each position p' G [p] . Since any 
two positions in [p] are incomparable, the replacement order is irrelevant. Now, 
a pumping is a pair (C[C"[i]] p , C[C /n [t]][ p ]) where C'[t] and t are accepted in 
the same state. 

Preserving the disequality constraints. We hâve seen on Example 41 
that, if t is accepted by the automaton, replacing one of its subterms, say u, 
with a term v accepted in the same state as u, does not necessary yield an 
accepted term. However, the idea is now that, if we hâve sufficiently many 
such candidates v, at least one of the replacements will keep the satisfaction of 
disequality constraints. 

This is the what shows the following lemma which states that minimal ac- 
cepted terms cannot contain too many subterms accepted in the same state. 

Lemma 5. Given any total simplification ordering, a minimal term accepted 
by a deterministic automaton in AWCBB contains at most \Q\ x N distinct 
subterms where N is the maximal arity of a function symbol and \Q\ is the 
number of states of the automaton. 
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Figure 4.3: Constructing a smaller term acceptée! by the automaton 



Proof. If p is a run, let rp be the mapping from positions to states such that 
rp(p) is the target state of p(p). 

If t is acceptée! by the automaton (let p be a successful run on t) and contains 
at least 1 + \Q\ x N distinct subterms, then there are at least N + 1 positions 
Pi,...,Pn+i such that rp{p 1 ) = ... = rp{p N+1 ) and t\ Pl , . . . ,t\ PN+1 are dis- 
tinct. Assume for instance that t\ PN+1 is the largest term (in the given total 
simplification ordering) among t\ Pl , . . . ,t\ PN+1 . We claim that one of the terms 

V{ = t[t\pi]h> N+1 l (i < N) is accepted by the automaton. 

For each i < N, we may define unambiguously a tree pi by: pi = p[p\pi]\ PN+1 \- 

First, note that, by determinacy, for each position p G [pjv+i], t p(p) = 
t p(pn+i) = Tp(j>i). To show that there is a pi which is a run, it remains to find 
a pi the constraints of which are satisfied. Equality constraints of any pi are 
satisfied, from the construction of the équivalence classes (détails are left to the 
reader). 

Concerning disequality constraints, we choose i in such a way that ail sub- 
terms at brother positions of pn+i are distinct from t\ Pi (this choice is possible 
since N is the maximal arity of a function symbol and there are N distinct 
candidates). We get a replacement as depicted on Figure 4.3. 

Let Pn+i = tt ■ k where k G N (w is the position immediately above pjv+i). 
Every disequality in cons{-n) is satisfied by choice of i. Moreover, if p' G [pjv+i] 
and p' = -k' ■ k 1 with k' G N, then every disequality in mathitcons(ir') is satisfied 
since Vi\„ = v t \^ . 

Hence we constructed a term which is smaller than t and which is accepted 
by the automaton. This yields the lemma. □ 



Theorem 33. Emptiness can be decided in polynomial time for deterministic 
automata in AWCBB. 

Proof. The basic idea is that, if we hâve enough distinct terms in states q%, . . . , q n , 
then the transition f(qi, . . . , q n ) — > q is possible. Use a marking algorithm (as 
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in Theorem 3) and keep for each state the terms that are known to be acceptée! 
in this state. It is sufHcient to keep at most N terms in each state (N is the 
maximal arity of a function symbol) according to Lemma 5 and the determinacy 
hypothesis (terms in différent states are distinct). More precisely, we use the 
following algorithm: 



input: AWCBB.4= (Q,F,Q f ,A) 
begin 

- Marked is a mapping which associâtes each state with a set of 
terms accepted in that state. 

Set Marked to the function which maps each state to the 
repeat 

Set Marked (q) to Marked(q) U {t} 
where 

/ G T n , t\ G Marked(qi), . . . , t n G Marked (q n ), 
f(qi,...,q n ) -^ q e A, 
t = f(ti,...,t„) and t \= c, 
\Marked{q)\ < N - 1, 
until no term can be added to any Marked(q) 
output: true if, for every state qf G Qf, Marked(qf) = 0. 
end 



n 



Complexity. For non-deterministic automata, an exponential time algorithm 
is derived from Proposition 20 and Theorem 33. Actually, in the non-deterministic 
case, the problem is EXPTIME-complete. 

We may indeed reduce the following problem which is known to be EXPTIME- 
complete to non-emptiness décision for nondeterministic AWCBB. 
Instance n tree automata Ai,. . . ,A n over T. 

Answer "yes" iff the intersection the respective languages recognized by A\ ,. . . An 
is not empty. 

We may assume without loss of generality that the states sets of Ai,. . ■ An 
(called respectively Qi,. . . ,Q n ) are pairwise disjoint, and that every Ai has a 
single final state called q( . We also assume that n = 2 k for some integer k. If 
this is not the case, let k be the smallest integer i such that n < 2 Z and let 
n' = 2 . We consider a second instance of the above problem: A'i,. . . A'n' 
where 

A! i = Ai for each i < n. 

A'i = {{q},F,{q},{f(q,---,q) -> q\f 6 T}) for each n< i < n' . 

Note that the tree automaton in the second case is universal, i.e.it accepts 
every term of T(T). Hence, the answer is "yes" for A!\ v . . ,A' n i iff it is "yes" 
for Ai,. . . ,A n . 
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Now, we add a single new binary symbol g to F, getting J 7 ' , and consider 
the following AWCBB A: 

n 
A=(\jQ l V{q 1 ,...,q n - 1 },F',{q 1 },A) 

i=l 

where q\,. . . ,Ç2n-i are new states, and the transition of A are: 
every transition rule of Ai,. . . ,A n is a transition rule of A, 

for each i < ^, g(q-2i, q2i+i) ► q% is a transition rule of A, 

for each i, t- < i < n — 1, giq^, <?2ï+i) ~ > ft is a transition rule of A. 

Note that A is non-deterministic, even if every Ai is deterministic. 

We can show by induction on k (n = 2 k ) that the answer to the above 
problem is "yes" iff the language recognized by A is not empty. Moreover, the 
size of A si linear in the size of the initial problem and A is constructed in a time 
which is linear in his size. This proves the EXPTIME-hardness of emptiness 
décision for AWCBB. 

4.3.3 Applications 

The main différence between AWCBB and NFTA is the non-closure of AWCBB 
under projection and cylindrification. Actually, the shift from automata on trees 
to automata on tuples of trees cannot be extended to the class AWCBB. 

As long as we are interested in automata recognizing sets of trees, ail results 
on NFTA (and ail applications) can be extended to the class AWCBB (with 
an bigger complexity) . For instance, Theorem 26 (sort constraints) can be 
extended to interprétations of sorts as languages accepted by AWCBB. Propo- 
sition 15 (encompassment) can be easily generalized to the case of non-linear 
terms in which non-linearities only occur between brother positions, provided 
that we replace NFTA with AWCBB. Theorem 27 can also be generalized to 
the reducibility theory with predicates < t where t is non-linear terms, provided 
that non-linearities in t only occur between brother positions. 

However, we can no longer invoke an embedding into WSkS. The important 
point is that this theory only requires the weak notion of recognizability on 
tuples (Rec x )- Hence we do not need automata on tuples, but only tuples 
of automata. As an example of application, we get a décision algorithm for 
ground reducibility of a term t w.r.tAeït hand sides li, . . . , l n , provided that ail 
non-linearities in t,l\, . . . ,l n occur at brother positions: simply compute the 
automata Ai accepting the terms that encompass h and check that L(A) Ç 
L{Ai)U...UL(A„). 

Finally, the application on réduction stratégies does not carry over the case 
of non-linear terms because there really need automata on tuples. 

4.4 Réduction Automata 

As we hâve seen above, the first-order theory of finitely many unary encom- 
passment predicates < tl , . . . , < tn (reducibility theory) is decidable when non- 
linearities in the terms ti are restricted to brother positions. What happens 
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when we drop the restrictions and consider arbitrary ternis t\, . . . ,t n ,t7 It 
turns out that the theory remains decidable, as we will see. Intuitively, we make 
impossible counter examples like the one in the proof of Theorem 32 (stating 
vmdecidability of the emptiness problem for AWEDC) with an additional con- 
dition that using the automaton which accepts the set of terms encompassing t, 
we may only check for a bounded number of equalities along each branch. That 
is the idea of the next définitions of réduction automata. 

4.4.1 Définition and Closure Properties 

A réduction automaton A is a member of AWEDC such that there is an 
ordering on the states of A such that, 

for each rule f(qi, . . . , q n ) — — - — ► q, q is strictly smaller than each 
qi- 

In case of an automaton with e-transitions q — y q' we also require q' to be 
not larger than q. 

Example 43. Consider the set of terms on the alphabet T = {a, g} encom- 
passing g(g(x,y),x). It is accepted by the following réduction automaton, the 
final state of which is qf and qf is minimal in the ordering on states. 

a -> q-y 5(<?t,<7t) Q g (x,y) 

g(qT,q g (x,y)) -> 

/ n 11=2 , , 11^2 

g{q g (x,y),qT) ► qf g(q g (x, y ),qT) ► Q g {x, y ) 



qr 


3(<7t,<7tj 


1g(x,y) 




Qf 


g(q g (x, y ),qT) 


Qf 


g(q g (x,y),q g (x, y )) 


Qf 


g(Qf,Q) 



g{q g (x, V ),q g (x,y)) -^^ qf g(q g (x,y),q g (x, y )) > Q g {x, v ) 

g(Q,Qf) -> Qf g(Qf,q) -> Qf 

where q e {qT,q g { x , v ),Qf} 

This construction can be generalized, along the lines of the proof of Propo- 
sition 15 (page 96): 

Proposition 23. The set of terms encompassing a term t is accepted by a 
deterministic and complète réduction automaton. The size of this automaton is 
polynomial in \\t\\ as well as the time complexity for its construction. 

As usual, we are now interested in closure properties: 

Proposition 24. The class of réduction automata is closed under union and 
intersection. It is not closed under complémentation. 

Proof. The constructions for union and intersection are the same as in the proof 
of Proposition 21, and therefore, the respective time complexity and sizes are 
the same. The proof that the class of réduction automata is closed under thèse 
constructions is left as an exercise. Consider the set L of ground terms on the 
alphabet {a, /} defined by a G L and for every t Ci L which is not a, t lias a 
subterm of the form f(s,s') where s ^ s'. The set L is accepted by a (non- 
deterministic, non-complete) réduction automaton, but its complément is the 
set of balanced binary trees and it cannot be accepted by a réduction automaton 
(see Exercise 56). □ 
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The weak point is of course the non-closure under complément. Conse- 
quently, this is not possible to reduce the non-determinism. 
However, we hâve a weak version of stability: 

Proposition 25. • With each réduction automaton, we can associate a 

complète réduction automaton which accepts the same language. More- 
over, this construction préserves the determinism. 

• The class of complète deterministic réduction automata is closed under 
complément. 

4.4.2 Emptiness Décision 

Theorem 34. Emptiness is decidable for the class of réduction automata. 

The proof of this resuit is quite complicated and gives quite high upper 
bounds on the complexity (a tower of several exponentials). Hence, we are not 
going to reproduce it hère. Let us only sketch how it works, in the case of 
deterministic réduction automata. 

As in Section 4.3.2, we hâve both to préserve equality and disequality con- 
straints. 

Concerning equality constraints, we also define an équivalence relation be- 
tween positions (of equal subtrees) . We cannot claim any longer that two équiv- 
alent positions do hâve the same length. However, some of the properties of the 
équivalence classes are preserved: first, they are ail finite and their cardinal 
can be bounded by a number which only dépends on the automaton, because 
of the condition with the ordering on states (this is actually not true for the 
class AWCBB). Then, we can compute a bound 62 (which only dépends on the 
automaton) such that the différence of the lengths of two équivalent positions 
is smaller than 62- Nevertheless, as in Section 4.3.2, equalities are not a real 
problem, as soon as the automaton is deterministic. Indeed, pumping can then 
be defined on équivalence classes of positions. If the automaton is not determin- 
istic, the problem is more difncult since we cannot guarantee that we reach the 
same state at two équivalent positions, hence we hâve to restrict our attention 
to some particular runs of the automaton. 

Handling disequalities requires more care; the number of distinct subterms of 
a minimal accepted term cannot be bounded as for AWCBB by \Q\ xN, where N 
is the maximal arity of a function symbol. The problem is the possible "overlap" 
of disequalities checked by the automaton. As in Example 41, a pumping may 
yield a term which is no longer accepted, since a disequality checked somewhere 
in the term is no longer satisfied. In such a case, we say that the pumping créâtes 
an equality. Then, we distinguish two kinds of equalities created by a pumping: 
the close equalities and the remote equalities. Roughly, an equality created 
by a pumping (t[v(u)] p , t[u] p ) is a pair of positions (it-tti, 'K'Tï^) of t[v(u)] p which 
was checked for disequality by the run p at position n on t[v(u)] p and such that 
^MpItt-u! = ^[M] p | T .7r 2 (tt is the longest common prefix to both members of the 
pair). This equality (jr ■ 7Ti,7r ■ ^2) is a close equality if tt < p < tt ■ -K\ or 
tt < p < tt ■ tï2. Otherwise (p > tt ■ tt\ or p > n ■ 7^), it is a remote equality. The 
différent situations are depicted on Figures 4.4 and 4.5. 

One possible proof sketch is 
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Figure 4.4: A close equality is created 




Figure 4.5: A remote equality is created 
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• First show that it is sufficient to consider equalities that are created at 
positions around which the states are incomparable w.r.t.> 

• Next, show that, for a deep enough path, there is at least one pumping 
which does not yield a close equality (this makes use of a combinatorial 
argument; the bound is an exponential in the maximal size of a constraint). 

• For remote equalities, pumping is not sufficient. However, if some pump- 
ing créâtes a remote equality anyway, this means that there are "big" equal 
ternis in t. Then we switch to another branch of the tree, combining pump- 
ing in both subtrees to find one (again using a combinatorial argument) 
such that no equality is created. 

Of course, this is a very sketchy proof. The reader is referred to the bibliog- 
raphy for more information about the proof. 

4.4.3 Finiteness Décision 

The following resuit is quite difficult to establish. We only mention them for 
sake of completeness. 

Theorem 35. Finiteness of the language is decidable for the class of réduction 
automata. 

4.4.4 Term Rewriting Systems 

There is a strong relationship between réduction automata and term rewriting. 
We mention them readers interested in that topic. 

Proposition 26. Given a term rewriting systemlZ, the set of ground IZ-normal 
forms is recognizable by a réduction automaton, the size of which is exponential 
in the size of 1Z. The time complexity of the construction is exponential. 

Proof. The set of 7?.-reducible ground terms can be defined as the union of sets 
of ground terms encompassing the left members of rules of TZ. Thus, by Propo- 
sitions 23 and 24 the set of 7?.-reducible ground terms is accepted by a deter- 
ministic and complète réduction automaton. For the union, we use the product 
construction, preserving determinism (see the proof of Theorem 5, Chapter 1) 
with the price of an exponential blowup. The set of ground 7?.-normal forms 
is the complément of the set of ground 7?.-reducible terms, and it is therefore 
accepted by a réduction automaton, according to Proposition 25. □ 

Thus, we hâve the following conséquence of Theorems 35 and 34. 

Corollary 5. Emptiness and finiteness of the language of ground IZ-normal 
forms is decidable for every term rewriting System TZ. 

Let us cite another important resuit concerning recognizability of sets normal 
forms. 

Theorem 36. The membership of the language of ground normal forms to the 
class of recognizable tree languages is decidable. 
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4.4.5 Application to the Reducibility Theory 

Consider the reducibility theory of Section 3.4.2: there are unary predicate sym- 
bols <l t which are interpreted as the set of terms which encompass t. However, 
we accept now non linear terms t as indices. 

Propositions 23, and 24, and 25 yield the following resuit: 

Theorem 37. The reducibility theory associated with any sets of terms is de- 
cidable. 

And, as in the previous chapter, we hâve, as an immédiate corollary: 

Corollary 6. Ground reducibility is decidable. 

4.5 Other Decidable Subclasses 

Complexity issues and restricted classes. There are two classes of au- 
tomata with equality and disequality constraints for which tighter complexity 
results are known: 

• For the class of automata containing only disequality constraints, empti- 
ness can be decided in deterministic exponential time. For any term 
rewriting System 7?., the set of ground TvL-normal forms is still recogniz- 
able by an automaton of this subclass of réduction automata. 

• For the class of deterministic réduction automata for which the constraints 
"cannot overlap" , emptiness can be decided in polynomial time. 

Combination of AWCBB and réduction automata. If you relax the 
condition on equality constraints in the transition rules of réduction automata so 
as to allow constraints between brothers, you obtain the biggest known subclass 
of AWEDC with a decidable emptiness problem. 

Formally, thèse automata, called generalized réduction automata, are 
members of AWEDC such that there is an ordering on the states set such that, 

for each rule /(<?i, . . . , q n ) — - — - — ► q,qisa lower bound of {gi, . . . , q n } 

and moreover, if |tti | > 1 or 1 7T2 1 > 1, then q is strictly smaller than 
each qi. 

The closure and decidability results for réduction automata may be trans- 
posed to generalized réduction automata, with though a longer proof for the 
emptiness décision. Generalized réduction automata can thus be used for the 
décision of reducibility theory extended by some restricted sort déclarations. In 
this extension, additionally to encompassment predicates < t) we allow a family 
of unary sort predicates . G S 1 , where S is a, sort symbol. But, sort déclarations 
are limited to atoms of the form t G S where where non linear variables in t 
only occur at brother positions. This fragment is decidable by an analog of 
Theorem 37 for generalized réduction automata. 
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4.6 Tree Automata with Arithmetic Constraints 

Tree automata deal with finite trees which hâve a width bounded by the maxi- 
mal arity of the signature but there is no limitation on the depth of the trees. 
A natural idea is to relax the restriction on the width of ternis by allowing 
function of variadic arity. This has been considered by several authors for ap- 
plications to graph theory, typing in object-oriented languages, temporal logic 
and automated déduction. In thèse applications, variadic functions are set or 
multiset constructors in some sensé, therefore they enjoy additional properties 
like associativity and/or commutativity and several types of tree automata hâve 
been designed for handling thèse properties. We describe hère a class of tree 
automata which recognize terms build with usual function symbols and multiset 
constructors. Therefore, we deal not only with terms, but with so-called flat 
terms. Equality on thèse terms is no longer the syntactical identity, but it is 
extended by the equality of multisets under permutation of their éléments. To 
recognize sets of flat terms with automata, we shall use constrained rules where 
the constraints are Presburger's arithmetic formulas which set conditions on the 
multiplicities of terms in multisets. Thèse automata enjoy similar properties to 
NFTA and are used to test completeness of function définitions and inductive 
reducibility when associative-commutative functions are involved, provided that 
some syntactical restrictions hold. 

4.6.1 Flat Trees 

The set of function symbols Q is composed of T , the set of function symbols 
and of M., the set of function symbols for building multisets. For simplicity we 
shall assume that there is only one symbol of the latter form, denoted by U and 
written as an infix operator. The set of variables is denoted by X. Flat terms 
are terms generated by the non-terminal T of the following grammar. 



iY 
T 
S 
U 



1|2|3... 

S | U (flat terms) 

x\f(Ti,...,T n ) (flat terms of sort T) 

Ni.Si U . . . U N p .Sp (flat terms of sort U) 



where x G X, n > is the arity of /, p > 1 and Y] ;—i Ni > 2. Moreover the 
inequality Si ^p Sj holds for i ^ j, 1 < i,j < n, where =p is defined as the 
smallest congruence such that: 

• x =p x, 

• f(si,. . . ,s n ) = P /(ii,. . .,£„) if / eJ 7 and s* = P U for i = 1, . . . ,n, 

• ni. si U . . . U n p .Sp =p m\.t\ U . . . U m q .t q if p = q and there is some 
permutation a on {l,...,p} such that s, =p t^u) and n^ = m^^ for 
i = l,...,p. 



Example 44. 3. a and 3. a U 2./(x, b) are flat terms, but 2. a U l.a U f(x, h) is 
not since 2. a and l.a must be grouped together to make 3. a. 
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The usual notions on ternis can be generalized easily for fiât terms. We 
recall only what is needed in the following. A fiât term is ground if it contains 
no variables. The root of a flat terni is defined by 

• for the fiât terms of sort J 7 , root(x) = x, root(f(ti, . . . ,t n )) = f, 

• for the fiât terms of sort U, root(s\ U . . . U s n ) = U. 

Our notion of subterm is slightly différent from the usual one. We say that 
s is a subterm of t if and only if 

• either s and t are identical, 

• or t = /(si, . . . , s n ) and s is a subterm of some Si, 

• or £ = ni. ti U ... U n p .t p and s is a subterm of some tj. 

For simplicity, we extend U to an opération between flat terms s, t denoting 
(any) flat term obtained by regrouping éléments of sort T in s and t which 
are équivalent modulo =p, leaving the other éléments unchanged. For instance 
s = 2. a U l.f(a, a) and t = 3.6 U 2. f(a, a) yields s U t = 2.a U 3.6 U 3. /(a, a). 

4.6.2 Automata with Arithmetic Constraints 

There is some regularity in flat terms that is likely to be captured by some class 
of automata-like recognizers. For instance, the set of fiât terms such that ail 
integer coefficients occurring in the terms are even, seems to be easily recogniz- 
able, since the predicate even(n) can be easily decided. The class of automata 
that we describe now has been designed for accepting such sets of ground flat 
terms. A Rat tree automaton with arithmetic constraints (NFTAC) over 
G is a tuple (Qp,Qu,G,Qf, A) where 

• Q? U Qu is a finite set of states, such that 

— Qjr is the set of states of sort T, 

— Qu is the set of states of sort U, 

- Q^nQ u = 0, 

• Qf Q Qf Ll Qu is the set of final states, 

• A is a set of rules of the form: 

- f{qi,- •■ ,q n ) -> q, for n > 0,/ G T n , qi,...,q n G Q T U Q u , and 
q 6 Qr, 

— N.q — ► q' , where q G Qr, q' G Qu, and c is a Presburger's arith- 
metic 2 formula with the unique free variable TV, 

- q\ U q 2 -> q 3 where qi,q 2 ,qz G Qu- 
Moreover we require that 



2 Presburger's arithmetic is first order arithmetic with addition and constants and 1. This 
fragment of arithmetic is known to be decidable. 
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— gi U qi — > q% is a rule of A implies that g2 U gi — > 93 is also a rule of 

A, 

— gi U (<7 2 U 53) — > g4 is a rule of A implies that (gi U q%) U 53 — » 54 is 
also a rule of A where 92 UÇ3 (resp. gi Ug2) dénotes any state g' such 
that there is a rule 92 U (73 — > q' (resp. gi U (72 — ► g')- 

Thèse two conditions on A will ensure that two flat ternis équivalent modulo 
=p reach the same states. 

Example 45. Let T = {a, /} and let A = (Qf, Qu,G, Qf, A) where 
Qr = {<7},<3u = {gu}, 

Qf = Uu}, 

A J AT 3n:N=2n 

A = < a — ► g iV.g — ► g u 

[ /(-,-) — > <7 ?uLlg u — » <7u 

where _ stands for q or g u . 



Let .4 = {Qf, Qu, G, Qf, A) be a flat tree automaton. The move relation 
— >a is defined by: let t, t' G T{T U Q, X), then £ — >_4 t' if and only if there is a 
context C G C(G U Q) such that t = C[s] and t' =p C[s'\ where 

• either there is some /(gi, ■ ■ ■ , q n ) — > g' G A and s = f(qi, . . . , q n ), s' = q' , 

• or there is some N.q — > g' G A and s = n.q with |= c(n), s' = q' , 

• or there is some qi U (72 - * Ç3 £ A and s = gi U g2 , s' = 53 • 
— >_4 is the reflexive and transitive closure of — >_&■ 



Example 46. Using the automaton of the previous example, one has 
2.oU6./(o,o)U2./(o,2.o) -m 2.gU6./(g,g) U2./(g,2.g) 

-Î.4 2.gU6.gU2./(g,g u ) 
-4,4 2.gU6.gU2.g 

-^a gu u gu u g u -^a qu u gu -m gu 

We define now semilinear flat languages. Let A = (Qr, Qu, G, Qf, A) be 
a flat tree automaton. A ground flat term t is accepted by A, if there is some 
g G <5/ such that t— ».4.g. The flat tree language I/(.4) accepted by *4 is the 
set of ail ground flat ternis accepted by A. A set of flat ternis is semilinear if 
there L = L(A) for some NFTAC A. Two flat tree automata are équivalent if 
they recognize the same language. 



Example 47. The language of ternis accepted by the automaton of Example 45 
is the set of ground flat terms with root U such that for each subterm m-Si U 
. . . U rip.Sp we hâve that ru is an even number. 
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Fiat tree automata are designed to take into account the =p relation, which 
is stated by the next proposition. 

Proposition 27. Let s, t, be two flat terms such that s =p t, let A be a flat 
tree automaton, then s — >a Q implies t—*j^q. 

Proof. The proof is by structural induction on s. □ 

Proposition 28. Given a flat term t and a flat tree automaton A, it is decidable 
whether t is accepted by A. 

Proof. The décision algorithm for membership for flat tree automata is nearly 
the same as the one for tree automata presented in Chapter 1, using an oracle 
for the décision of Presburger's arithmetic formulas. □ 

Our définition of flat tree automata corresponds to nondeterministic flat tree 
automata. We now define deterministic flat tree automata (DFTAC). 
Let A = (Qjr, Qu,Q, Qf, A) be a NFTAC over Q. 

• The automaton A is deterministic if for each ground flat term t, there 
is at most one state q such that t — >^ q. 

• The automaton A is complète if for each ground flat term t, there a state 
such that t — >_4 q. 

• A state q is accessible if there is one ground flat term t such that t — >a q. 
The automaton is reduced if ail states are accessible. 

4.6.3 Reducing Non-determinism 

As for usual tree automata, there is an algorithm for Computing an équivalent 
DFTAC from any NFTAC which proves that a language recognized by a NFTAC 
is also recognized by a DFTAC. The algorithm is similar to the determiniza- 
tion algorithm of the class AWEDC: the ambiguity arising from overlapping 
constraints is lifted by considering mutually exclusive constraints which cover 
the original constraints, and using sets of states allows to get rid of the non- 
determinism of rules having the same left-hand side. Hère, we simply hâve to 
distinguish between states of Qj: and states of Q u . 

Determinization algorithm 

input A= (Qr,Q u ,G,Qf,A) a NFTAC. 

begin 

A state [q] of the équivalent DFTAC is in 2 Q:F U 2 Qu . 

Set Q d r = 0, Qg = 0, A d = 0. 

repeat 

for each / of arity n, [q]i, . . ■ , [q] n £ Q% U Qu do 

let [q] = {q | 3/(gi, . . . , q n ) -> q G A with q % G [q] t for i = 1, . . . ,n} 
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in Set Q d jrtoQ d jrU{[q]} 

SetA d toA (J U{/([ç] 1 ,...,[ç] Tl )->[g]} 

endfor 

for each [g] G Qjf do 

for each [q'} Ç {q" | 3N.q ^ q" £ A with q £ [q}} do 

letCbe(/\ \/ ^(7V))a(/\ /\ ^(JV)) 

g'e[g'] ?'£[<?'] 

in Set Q<* toQ^U{[g']} 

SetA d toA (J U{iV.[ç]^[g / ]} 

endfor 
endfor 

for each [g]i, [q] 2 G Qu do 

let [q] = {q \ 3q 1 e [g]i, g 2 e [g] 2, 9i U g 2 -> q e A} 
in Set Q<* toQj.U{[g]} 

Set A rf to A d U{[g]iU[g] 2 ^ [g]} 

endfor 

until no rule can be added to A^ 

SetQ^ = {[g]eQ^UQ^ | [g] n Q f + 0}, 

end 

output: A d = (Q d F ,Q d i ,FiQ d f ,&d) 

Proposition 29. The previous algorithm terminâtes and computes a determin- 
istic flat tree automaton équivalent to the initial one. 

Proof. The termination is obvious. The proof of the correctness relies on the 
following lemma: 

Lemma 6. t —>A d [q] */ o,nd only if t -^a Q f or a>U Q £ [<?]■ 

The proof is by structural induction on t and follows the same pattern as 
the proof for the class AWEDC. □ 

Therefore we hâve proved the following theorem stating the équivalence be- 
tween DFTAC and NFTAC. 

Theorem 38. Let L be a semilinear set of flat terms, then there exists a DFTAC 
that accepts L. 
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4.6.4 Closure Properties of Semilinear Fiat Languages 

Given an automaton A = (Q,Ç,Qf,A), it is easy to construct an équivalent 
complète automaton. If A is not complète then 

• add two new trash states qt of sort T and q^ of sort U, 

• for each / G T, q\, . . . , q n G Q U {qt, g ( u }, such that there is no rule having 
f(qi, . . . , q n ) as left-hand side, then add /(ci, . . . , q n ) -> q tl 

• for each q of sort .F, let ci(N), . . . , c m (N) be the conditions of the rules 

AT Ci{N} ' 

N.q — > q , 

— if the formula 3N (ci(N) V ... V c m (N)) is not équivalent to true, 

,, , , ., , M -.(d(iV)V...Vo m (i\0)(iV) u 

then add the rule J\l .q — > q)r , 

— if there are some q, q' of sort U such that there is no rule q U q' — > g" , 
then add the rules g U g' -> g t u and q' U g — > gjr 1 . 

— if there is some rule (ci U (72) Ll (73 — > gjr 1 (resp. 91 U (92 Ll ^3) — > g^ , 
add the rule q\ U (<j2 U 93) — * 9^ (resp. (gi U 52) Ll 93 — > g^ 1 ) if it is 
missing. 

This last step ensures that we build a flat tree automaton, and it is straight- 
forward to see that this automaton is équivalent to the initial one (same proof 
as for DFTA). This is stated by the following proposition. 

Theorem 39. For each flat tree automaton A, there exists a complète équivalent 
automaton B. 



Example 48. The automaton of Example 45 is not complète. It can be 
completed by adding the states qt^t 1 , and the rules N.q t —^ q^ 

N.q 3 "^«+ 1 q u 

/(-,-) — qt 

where (_, _) stands for a pair of q, qu,qt, q^ such that if a rule the left hand side 
of which is /(_, _) is not already in A. 



Theorem 40. The class of semilinear flat languages is closed under union. 

Proof. Let L (resp. M) be a semilinear flat language recognized by A = 
(Qr,Qu,G,Qf,A) (resp. B = (Q' r ,Q' u ,G,Q' f ,A')), then LU M is recognized 
by C = (Qt U Q' t , Q u U Q' U ,Q, Q f U Q' f , A U A'). □ 

Theorem 41. The class of semilinear flat languages is closed under complé- 
mentation. 

Proof. Let A be an automaton recognizing L. Compute a complète automaton 
B équivalent to A. Compute a deterministic automaton C équivalent to B using 
the determinization algorithm. The automaton C is still complète, and we get an 
automaton recognizing the complément of L by exchanging final and non-final 
states in C. □ 
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From the closure under union and complément, we get the closure under 
intersection (a direct construction of an automaton recognizing the intersection 
also exists). 

Theorem 42. The class of semilinear flat languages is closed under intersec- 
tion. 

4.6.5 Emptiness Décision 

The last important property to state is that the emptiness of the language 
recognized by a fiât tree automaton is decidable. The décision procédure relies 
on an algorithm similar to the décision procédure for tree automata combined 
to a décision procédure for Presburger's arithmetic. However a straightforward 
modification of the algorithm in Chapter 1 doesn't work. Assume that the 
automaton contains the rule g^ U Çj U — ► q 2 and assume that there is some 

flat term t such that l.t — >aQi- Thèse two hypothesis don't imply that \.t U 
l.t — »_4 q 2 since l.t U l.t is not a flat term, contrary to 2. t. Therefore the décision 
procédure involves some combinatorics in order to ensure that we always deal 
with correct flat ternis. 

From now on, let A = (Qf,Qu,G,Qf, A) be some given deterministic flat 
tree automaton and let M be the number of states of sort U. First, we need to 
control the possible infinité number of solutions of Presburger's conditions. 

Proposition 30. There is some computable B such that for each condition 
c(N) of the rules of A, either each integer n validating c is smaller than B or 
there are at least M + 1 integers smaller than B validating c. 

Proof. First, for each constraint c(N) of a rule of A, we check if c(N) has a 
finite number of solutions by deciding if 3P : c(N) => N < P is true. If c(N) 
has a finite number of solutions, it is easy to find a bound Bi(c(N)) on thèse 
solutions by testing 3n : n > k A c(n) for k = 1,2,... until it is false. If c(N) 

th 
has an infinité number of solutions, one computes the M solution obtained 

by checking |= c(fc) for k = 1,2, . . .. We call this A/ th solution B 2 {c{N)). The 

bound B is the maximum of ail the i?i(c(Af))'s and B 2 (c(N))'s. □ 

Now we control the maximal width of terms needed to reach a state. 

Proposition 31. For ail t^t^q, there is some s—>aQ such that for each sub- 
term of s of the form n\.V\ U . . . U n p .v p , we hâve p < M and ni < B . 

Proof. The bound on the coefficients rii is a direct conséquence of the previous 
proposition. The proof on p is by structural induction on t. The only non-trivial 
case is for t = m\.t\ U . . . U nik-tk- Let us assume that t is the term with the 
smallest value of k among the terms {t' 1 1' — >_^ q}. 

First we show that k < Al. Let q^ be the states such that rii.ti — >_4 q^. 
We hâve thus t — >^ ç^ U . . . U ç^ — >_4 q. By définition of DFTAC, the réduction 
q 1 ^ U . . . U q^ — >_4 q has the form: 

ïlU-.Ug^ q\i 2] U Î3 U U...U^A...A q\i k] = q 
for some states qV 12 -,,. ■ ■ , q\[ u of Q u - 
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Assume that k > M. The pigeonhole principle yields that cm ^1 = <Z[i,...j 2 ] 
for some 1 < j\ < ji < k. Therefore the term 

t = ui\.t\ U . . . U uij 1 .tj 1 U uij 2+ i.tj 2+ i U . . . U TTife.ife 

also reaches the state q which contradicts our hypothesis that k is minimal. 

Now, it remains only to use the induction hypothesis to replace each ti by 
some Si reaching the same state and satisfying the required conditions. □ 

A term s such that for ail subterm n\.vi U . . . n p .v p of s, we hâve p < M and 
ni < B will be called small. We define some extension — >^ of the move relation 
by: 

• t — »^t q if and only if t —>a <Z) 

• t — >^ q if and only if t — >a Q and 

— either t = f(ti, ■ ■ ■ ,tk) and for i = 1, . . . , k we hâve ti — >^ _ Qi(U), 

— or t = ni .ti U . . . U n p .tp and for i = 1, . . . , p, we hâve ti — > ^ _ qi(ti). 

Let £™ = {i — >^ g | p < n and i is small} with the convention that C° = 
and £ ? = U n=1 £q- By Proposition 31, t—>^q if and only if there is some 
s <E C q such that s -^a <7- The emptiness décision algorithm will compute a 
finite approximation 1Z™ of thèse £" such that 7?.™ 7^ if and only if C" ^ 0. 

Some technical définition is needed first. Let L be a set of flat term, then 
we define || L \\p as the number of distinct équivalence classes of ternis for the 
=p relation such that one représentant of the class occurs in L. The reader will 
check easily that the équivalence class of a flat term for the =p relation is finite. 

The décision algorithm is the following one. 

begin 

for each state q do set 1Z to 0. 

i=l. 

repeat 

for each state q do set 1Z„ to TV q ~ x 
if II H\ \\p< M then 
repeat 

add to 7V„ ail flat ternis t = /(ti, . . . , t n ) 
such that tj G ^* _1 , j < n and f(qi, ■ ■ ■ , q n ) — > g G A 
add to 7V„ ail flat ternis t = ni.ti U . . . U n p .t p 
such that p < M,rij < B, tj e Tt q ~ x and ni.çi U . . . U n p .q p -^a Q- 
until 110 new term can be added or || lZ q \\p> M 

endif 

i=i+l 



until 3q G Q f such that K\ ^ or Vg, K q = W' 1 

if 3g G Q F s.t.Tlq ^0 
then return not empty 
else return empty endif 
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end 

Proposition 32. The algorithm terminâtes after n itérations for some n and 
lZ n q = if and only if £ ç = 

Proof. At every itération, either one 1Z„ increases or else ail the 1Z„ 's are left 
untouched in the repeat . . . until loop. Therefore the termination condition 
will be satisfied after a finite number of itérations, since équivalence classes for 
=P are finite. 

By construction we hâve TZ™ Ç £™, but we need the following additional 
property. 

Lemma 7. For ail m,K™ = C™ or R™ Ç C™ and \\ R™ || P > M 

The proof is by induction on m. 

Base case m = 0. Obvious from the définitions. 

Induction step. We assume that the property is true for m and we prove that 
it holds for m + 1 . 

Either C" 1 = therefore IZ™ = and we are done, or £™ ^ 0, which we assume 
from now on. 

• q G Qt- 

— Either there is some rule /(ci, . . . ,q n ) — > q such that R™ ^ for 
ail i = 1 , . . . , n and such that for some q' among q\ , . . . , q n , we hâve 
Il TZ") llf' > M. Then we can construct at least M + 1 terms t = 
/(il, . . . ,£', . . . ,t n ) where t' G ft^, such that i G 7?.™ +1 by giving 
M + 1 non équivalent values to t' (corresponding values for t are also 
non équivalent). This yields that || K q a+1 \\ P > M. 

— Or there is no rule as above, therefore 1Z q n+1 = C q n+1 . 

• q e Q u - 

For each small term t = n\.t\ U . . . U n p .t p such that t G £™ +1 , there 

are some terms si, . . . , s n in TZ,™ such that f, — >^ gi implies that 5j — >_4 g^. 
What we must prove is that || H™ \\p> M for some i < p implies || 

lZ q n+1 \\p> M. Since A is deterministic, we hâve that s — >_a q and t -^_a q' 
with q ^ q' implies that s ^p t. Let S be the set of states occurring in 
the séquence q\, . . . , q p . We prove by induction on the cardinal of S that 
if there is some qi such that || 1Z™ \\p> M, we can build at least M + 1 
terms in TZ" l+1 otherwise we build at least one term of TZ q n+1 . 

Base case S = {q 1 }, and therefore ail the qi are equal to q' . Either 
Il TUS ||p< M and we are done or || 7Z 7 3 \\p> M and we know that there 
are s\, . . . , sm+i, ■ ■ ■ pairwise non équivalent terms reaching q' . Therefore, 
there are at least ( M + ) > M + 1 différent non équivalent possible terms 
ni x .Si x U . . . U m .Si . Moreover each of thèse terms S satisfies s — >^ q, 
which proves the resuit. 
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Induction step. Let S = S' U {q'} where the property is true for S'. We 
can assume that || 1Z"! \\p< M (otherwise ail || TZ™ \\p are less than or 
equal to M ). 

Let ii , . . . , ik be the positions of q' in ci, . . . , q p , let j\, . . . , ji be the posi- 
tions of the states différent from q' in q\, . . . , q p . By induction hypothesis, 
there are some flat ternis Sj such that nj t .Sj 1 U . . . U n 3l .Sj l is a valid fiât 
term. Since A is deterministic and q' is différent from ail élément of 5", 
we know that Si ^p Sj for any i G {i\, . . . ,ik}, j G {ji, ■ ■ ■ ,jk}- There- 
fore, we use the same reasoning as in the previous case to build at least 
Cm+i — M + 1 pairwise non équivalent flat terms s = n\.s\ U . . . U n p .s p 
such that s —>^ +1 Q- 

The termination of the algorithm implies that for each m > n, 7U1 1 = £™ or 
TL™ Ç £™ and || K™ \\p> M . Therefore C q = if and only if K n q =%. 

U 

The following theorem summarizes the previous results. 

Theorem 43. Let A be a DFTAC, then it is decidable whether the language 
accepted by A is empty or not. 

The reader should see that the property that A deterministic is crucial in 
proving the emptiness décision property. Therefore proving the emptiness of the 
language recognized by a NFTAC implies to compute an équivalent DFTAC 
first. 

Another point is that the previous algorithm can be easily modified to com- 
pute the set of accessible states of A. 

4.7 Exercises 

Exercise 52. 

1. Show that the automaton A+ of Example 38 accepts only terms of the form 
f(t 1 ,s n (0),s m (0),s n+m (0)) 

2. Conversely, show that, for every pair of natural numbers (n, m), there exists a 
term ti such that f(t u s n (0), s m (0), s n+m (0)) is accepted by A+. 

3. Construct an automaton A x of the class AWEDC which has the same properties 
as above, replacing + with x 

4. Give a proof that emptiness is undecidable for the class AWEDC, reducing 
Hilbert's tenth problem. 

Exercise 53. Give an automaton of the class AWCBB which accepts the set of terms 
t (over the alphabet {a(0), b(Q), /(2)}) having a subterm of the form f(u, u). (i.e.the 
set of terms that are reducible by a rule f(x,x) —> v). 

Exercise 54. Show that the class AWCBB is not closed under linear tree homomor- 
phisms. Is it closed under inverse image of such morphisms? 

Exercise 55. Give an example of two automata in AWCBB such that the set of pairs 
of terms recognized respectively by the automata is not itself a member of AWCBB. 
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Exercise 56. (Proposition 24) Show that the class of (languages recognized by) 
réduction automata is closed under intersection and union. Show that the set of bal- 
anced term on alphabet {a, /} is not recognizable by a réduction automaton, showing 
that the class of languages recognized by) réduction automata is not closed under 
complément. 

Exercise 57. Show that the class of languages recognized by réduction automata is 
preserved under linear tree homomorphisms. Show however that this is no longer true 
for arbitrary tree homomorphisms. 

Exercise 58. Let A be a réduction automaton. We define a ternary relation q — > q 
contained in Q x N* x Q as follows: 

• for i € N, q —* q' if and only if there is a rule /(ci, . . . , q n ) — ► q' with q t — q 

A 

• q > q' if and only if there is a state q" such that q — > q" and q" — » q . 

Moreover, we say that a state g £ Q is a constrained state if there is a rule f(qi , . . . , q n ) — ► q 

A 
in A such that c is not a valid constraint. 

We say that the the constraints of A cannot overlap if, for each rule f (qi , . . . , q n ) — > q 
and for each equality (resp. disequality) iv — n' of c, there is no strict prefix p of 7r 
and no constrained state q' such that g' — » q. 

1. Consider the rewrite System on the alphabet {/(2),g(l), a(0)} whose left mem- 
bers are f(x,g(x)),g(g(x)),f(a,a). Compute a réduction automaton, whose 
constraints do not overlap and which accepts the set of irreducible ground terms. 

2. Show that emptiness can be decided in polynomial time for réduction automata 
whose constraints do not overlap. (Hint: it is similar to the proof of Theorem 
33.) 

3. Show that any language recognized by a réduction automaton whose constraints 
do not overlap is an homomorphic image of a language in the class AWCBB. 
Give an example showing that the converse is false. 

Exercise 59. Prove the Proposition ?? along the lines of Proposition 15. 

Exercise 60. The purpose of this exercise is to give a construction of an automaton 
with disequality constraints (no equality constraints) whose emptiness is équivalent to 
the ground reducibility of a given term t with respect to a given term rewriting System 

n. 

1. Give a direct construction of an automaton with disequality constraints -4nf(7î.) 
which accepts the set of irreducible ground terms 

2. Show that the class of languages recognized by automata with disequality con- 
straints is closed under intersection. Hence the set of irreducible ground in- 
stances of a linear term is recognized by an automaton with disequality con- 
straints. 

3. Let A NF (tz) = (Qnf,.F, <2nf> Anf). We compute .4 N F,t = (QNF,t,F, Qnf.c Anf.O 
as follows: 

clef 



Qnf,* = {t<j\p | p € Pos(t)} x Qnf where a ranges over substitutions from 
NLV(t) (the set of variables occurring at least twice in i) into Q^f- 

For ail /(ci, . . . , q n ) — > q € Anf, and ail Wi, . . . , u n £ {t&\p I P G Pos(t)}, 
Anf,( contains the following rules: 
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- /([g»i,gi],--- , [qu„,q n ]) - £ - 2 -> [g/( Ul ,..., u „),g] if /(mi,...,u„) = ta 

and c' is constructed as sketched below. 

- /([g«l,0l], ■■■>[?««>?«]) "^ [«/(«i «„)>?] if lfl/(ui, ...,«»)>?] £ QNF,t 

and we are not in the first case. 
_ /([<7«i 1 9i]j • • • i [9»„ 1 9n]) ^* [ç«> ?] in a U other cases 

c' is constructed as follows. From /(ui, . . . , u n ) we can retrieve the rules applied 
at position p in t. Assume that the rule at p checks ni 7^ 1x2- This amounts to 
check ptti ^ p7T2 at the root position of t. Let T> be ail disequalities p7ri 7^ p7T2 
obtained in this way. The non linearity of t implies some equalities: let S be 
the set of equalities p\ — P2, for ail positions pi,p2 such that t\ Pl — t\ P2 is a 
variable. Now, c is the set of disequalities ir ^ 7r' which are not in P and that 
can be inferred from 2?, £ using the rules 

PPi 7^ P2, P — P H p'pi / P2 
p / p', pp\ — p 2 h p'pi / P2 

For instance, let t = f(x,f(x,y)) and assume that the automaton .Ànf con- 
tains a rule /(ç, ç) » q. Then the automaton .ÀNF.t will contain the rule 

f([lq,q],[<lf(q,q),<l]) " > 1- 

The final states are [q u , ?/] where qj € Q>jf ano - M ^ s an instance of t. 

Prove that -4nf,é accepts at least one term if and only if t is not ground reducible 

by 11. 

Exercise 61. Prove Theorem 37 along the lines of the proof of Theorem 27. 

Exercise 62. Show that the algorithm for deciding emptiness of deterministic com- 
plète fiât tree automaton works for non-deterministic fiât tree automata such that for 
each state q the number of non-equivalent ternis reaching ç is or greater than or 
equal to 2. 

Exercise 63. (Feature tree automata) 

Let T be a finite set of feature symbols (or attributes) denoted by /, g, . . . and S be 
a set of constructor symbols (or sorts) denoted by A, B, . . .. In this exercise and the 
next one, a tree is a rooted directed acyclic graph, a multitree is a tree such that the 
nodes are labeled over S and the edges over T. A multitree is either (A, 0) or (A, E) 
where E is a finite multiset of pairs (/, t) with / a feature and t a multitree. A feature 
tree is a multitree such that the edges outgoing from the same node are labeled by 
différent features. The + opération takes a multitree t — (A,E), a feature / and a 
multitree t' to build the multitree (A,E U (/, t')) denoted by t + ft'. 

1. Show that t + fiti + /2Ï2 = t + /2^2 + fiti (OI axiom: order independence 
axiom) and that the algebra of multitrees is isomorphic to the quotient of the 
free term algebra over {+} U T U S by OI. 

2. A deterministic A^-automaton is a triple (A, h, Qf) where A is an finite {+} U 
T U 5-algebra, h : A 7 ! — * A is a homomorphism, Qf (the final states) is a subset 
of the set of the values of sort M. A tree is accepted if and only if h(t) £ Qf. 

(a) Show that a Al-automaton can be identified with a bottom-up tree au- 
tomaton such that ail trees équivalent under OI reach the same states. 

(b) A feature tree automaton is a A'f-automaton such that for each sort s (M 
or J 7 ), for each q the set of the c's of arity interpreted as q in A is finite 
or co-finite. Give a feature tree to recognize the set of natural numbers 
where n is encoded as (0, {suc, (0, {. . . , (0, 0)})}) with n edges labeled by 
suc. 



TATA — September 6, 2005 



4.8 Bibliographie notes 143 

(c) Show that the class of languages acceptée! by feature tree automata is 
closed under boolean opérations and that the emptiness of a language 
accepted by a feature automaton is decidable. 

(d) A non-deterministic feature tree automaton is a tuple (Q,P,h,Qf) such 
that Q is the set of states of sort M, P the set of states of sort T, h is 
composed of three functions h\ : S — » 2 e , /12 : T — > 2 P and the transition 
function + : Q x P x Q ^ 2 Q . Moreover q + piqi +P2Ç2 = q + P2Q2 +pi?i 
for each q,qi,q2,pi,P2, {s G S | p G hi(s)} and {/ G T \ p G h 2 (f)} are 
finite or co-finite for each p. Show that any non-deterministic feature tree 
automaton is équivalent to a deterministic feature tree automaton. 

Exercise 64. (Characterization of recognizable flat feature languages) 
A flat feature tree is a feature tree of depth 1 where depth is defined by depth((A, 0)) = 
and depth((A,E)) = l+max{depth(t) | (/, t) G E}. Counting constraints are defined 
by: C(x) ::= card(cp G F \ 3y.(xipy) A Ty}) — n mod m 

| Sx 

I C(x) V C{x) 

I C(x) A C{x) 
where n, m are integers, S and T finite or co-finite subsets of 5, -F a finite or co-finite 
subset of T and n mod is defined as n. The semantics of the first type of constraint 
is: C(x) holds if the number of edges of x going from the root to a node labeled by a 
symbol of T is equal to n mod m. The semantics of Sx is: Sx holds if the root of x is 
labeled by a symbol of S. 

1. Show that the constraints are closed under négation. Show that the following 
constraints can be expressed in the constraint language (F is a finite subset of 
JF, / G F, A G S): there is one edge labeled / from the root, a given finite 
subset of T . There is no edge labeled / from the root, the root is labeled by A. 

2. A set L of flat multitrees is counting definable if and only if there some counting 
constraint C such that L = {x C(x) holds}. Show that a set of flat feature trees 
is counting definable if and only if it is recognizable by a feature tree automaton, 
hint: identify flat trees with multisets over (JFU{rooi}) x S and + with multiset 
union. 



4.8 Bibliographie notes 

RATEG appeared in Mongy's thesis [M0118I]. Unfortvmately, as shown in 

[Mon81] the emptiness problem is undecidable for the class RATEG (and hence 

for AWEDC). The undecidability can be even shown for a more restricted class 

of automata with equality tests between cousins (see [Tom92]). 

The remarkable subclass AWCBB is defined in [BT92] . This paper présents the 

results cited in Section 4.3, especially Theorem 33. 

Concerning complexity, the resuit used in Section 4.3.2 (EXPTIME-completeness 

of the emptiness of the intersection of n recognizable tree languages) may be 

foundin [FSVY91, Sei94b]. 

[DCC95] is concerned with réduction automata and their use as a tool for the 

décision of the encompassment theory in the gênerai case. 

The first decidability proof for ground reducibility is due to [Pla85]. In [CJ97a], 

ground reducibility décision is shown EXPTIME-complete. In this work, an 

EXPTIME algorithm for emptiness décision for AWEDC with only disequality 

constrained The resuit mentioned in Section 4.5. 
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The class of generalized réduction automata is introduced in [CCC + 94]. In this 
paper, a efficient cleaning algorithm is given for emptiness décision. 

There hâve been many work dealing with automata where the width of 
terms is not bounded. In [Cou89], Courcelle devises an algebraic notion of rec- 
ognizability and studies the case of equational théories. Then he gives several 
equational théories corresponding to several notions of trees like ordered or un- 
ordered, ranked or unranked trees and provides the tree automata to accept 
thèse objects. Actually the axioms used for defining thèse notions are commu- 
tativity (for unordered) or associativity (for unranked) and what is needed is to 
build tree automata such that ail élément of the same équivalence class reach 
the same state. Trees can be also defined as finite, acyclic rooted ordered graphs 
of bounded degree. Courcelle [Cou92] has devised a notion of recognizable set of 
graphs and suggests to devise graph automata for accepting recognizable graphs 
of bounded tree width. He gives such automata for trees defined as unbounded, 
unordered, undirected, unrooted trees (therefore thèse are not what we call tree 
in this book) . Actually, he shows that recognizable sets of graphs are (homomor- 
phic image of) sets of équivalence class of terms, where the équivalence relation 
is the congruence induced by a set of equational axioms including associativity- 
commutativity axiom and identity élément. He gives several équivalent notions 
for recognizability from which he gets the définitions of automata for accepting 
recognizable languages. Hedge automata [PQ68, MurOO, BKMW01] are au- 
tomata that deal with unranked but ordered terms, and use constraint which 
are membership to some regular word expressions on an alphabet which is the 
set of states of the automaton. Thèse automata are closed under the boolean 
opérations and emptiness can be decided. Such automata are used for XML 
applications. Generalization of tree automata with Presburger's constraints can 
be found in [LD02]. 

Feature tree are a generalization of first-order trees introduced for modeling 
record structures. A feature tree is a finite tree whose nodes are labelled by 
constructor symbols and edges are labelled by feature symbols Niehren and 
Podelski [NP93] hâve studied the algebraic structures of feature trees and hâve 
devised feature tree automata for recognizing sets of feature trees. They hâve 
shown that this class of feature trees enjoys the same properties as regular tree 
language and they give a characterization of thèse sets by requiring that the 
numbern of occurrences of a feature / satisfies a Presburger formula ipf(N). 
See Exercise 63 for more détails. Equational tree automata, introduced by 
H.Ohsaki, allow equational axioms to take place during a run. For instance using 
AC axioms allows to recognize languages which are closed under associativity- 
commutativity which is not the case of ordinary regular languages. See [OhsOl] 
for détails. 
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Chapter 5 

Tree Set Automata 



This chapter introduces a class of automata for sets of terms called General- 
ized Tree Set Automata. Languages associated with such automata are sets of 
sets of terms. The class of languages recognized by Generalized Tree Set Au- 
tomata fulfills properties that suffices to build automata-based procédures for 
solving problems involving sets of terms, for instance, for solving Systems of set 
constraints. 



5.1 Introduction 

"The notion of type expresses the fact that one just cannot apply any operator 
to any value. Inferring and checking a program's type is then a proof of partial 
correction" quoting Marie-Claude Gaudel. "The main problem in this field is to 
be flexible while remaining rigorous, that is to allow polymorphism (a value can 
hâve more than one type) in order to avoid répétitions and write very gênerai 
programs while preserving decidability of their correction with respect to types. " 

On that score, the set constraints formalism is a compromise between power 
of expression and decidability. This has been the object of active research for a 
few years. 

Set constraints are relations between sets of terms. For instance, let us define 
the natural numbers with and the successor relation denoted by s. Thus, the 
constraint 

Nat = OUs(Nat) (5.1) 

corresponds to this définition. Let us consider the following System: 

(5.2) 



Nat 


= 


u s(Nat) 


List 


= 


cons(Nat, List) U nil 


List + 


C 


List 


car(List+) 


c 


s(Nat) 



The first constraint defines natural numbers. The second constraint codes the 
set of LISP-like lists of natural numbers. The empty list is nil and other lists 
are obtained using the constructor symbol cons. The last two constraints rep- 
resent the set of lists with a non zéro first élément. Symbol car has the usual 
interprétation: the head of a list. Hère car(List + ) can be interpreted as the set 
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of ail terms at first position in List + , that is ail terms t such that there exists u 
with cons(£, u) G List + . In the set constraint framework such an operator car is 
often written cons^ . 

Set constraints are the essence of Set Based Analysis. The basic idea is to 
reason about program variables as sets of possible values. Set Based Analy- 
sis involves first writing set constraints expressing relationships between sets of 
program values, and then solving the System of set constraints. A single approxi- 
mation is: ail dependencies between the values of program variables are ignored. 
Techniques developed for Set Based Analysis hâve been successfully applied in 
program analysis and type inference and the technique can be combined with 
others [HJ92]. 

Set constraints hâve also been used to define a constraint logic programming 
language over sets of ground terms that generalizes ordinary logic programming 
over an Herbrand domain [Koz98]. 

In a more gênerai way, a System of set constraints is a conjunction of positive 
constraints of the form exp Ç exp' 1 and négative constraints of the form exp $Z 
exp' . Right hand side and left hand side of thèse inequalities are set expressions, 
which are built with 

• function symbols: in our example 0, s, cons, nil are function symbols. 

• operators: union U, intersection fl, complément ~ 

• projection symbols: for instance, in the last équation of System (5.2) car 
dénotes the first component of cons. In the set constraints syntax, this is 
written cons7 1) . 

• set variables like Nat or List. 

An interprétation assigns to each set variable a set of terms only built with 
function symbols. A solution is an interprétation which satisfies the System. 
For example, {0, s(0), s(s(0)), . . . } is a solution of Equation (5.1). 

In the set constraint formalism, set inclusion and set union express in a 
natural way parametric polymorphism: List Ç nil U cons(A, List). 

In logic or functional programming, one often use dynamic procédures to 
deal with type. In other words, a run-time procédure checks whether or not an 
expression is well-typed. This permits maximum programming flexibility at the 
potential cost of efficiency and security. Static analysis partially avoids thèse 
drawbacks with the help of type inference and type checking procédures. The 
information extracted at compile time is also used for optimization. 

Basically, program sources are analyzed at compile time and an ad hoc for- 
malism is used to represent the resuit of the analysis. For types considered as 
sets of values, the set constraints formalism is well suited to represent them and 
to express their relations. Numerous inference and type checking algorithms in 
logic, functional and imperative programming are based on a resolution procé- 
dure for set constraints. 



exp' for exp Ç exp' A exp' Ç exp. 
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Most of the earliest algorithms consider Systems of set constraints with weak 
power of expression. More often than not, thèse set constraints always hâve a 
least solution — w.r.t. inclusion — which corresponds to a (tuple of) régulai* 
set of ternis. In this case, types are usual sorts. A sort signature defines a 
tree automaton (see Section 3.4.1 for the correspondence between automata 
and sorts). For instance, regular équations iontroduced in Section 2.3 such a 
subclass of set constraints. Therefore, thèse methods are closely related finite 
tree automata and use classical algorithms on thèse recognizers, like the ones 
presented in Chapter 1. 

In order to obtain a more précise information with set constraints in static 
analysis, one way is to enrich the set constraints vocabulary. In one hand, with 
a large vocabulary an analysis can be accurate and relevant, but on the other 
hand, solutions are difficult to obtain. 

Nonetheless, an essential property must be preserved: the decidability of 
satisfiability. There must exists a procédure which détermines whether or not a 
System of set constraints has solutions. In other words, extracted information 
must be sufficient to say whether the objects of an analyzed program hâve a type. 
It is crucial, therefore, to know which classes of set constraints are decidable, 
and identifying the complexity of set constraints is of paramount importance. 

A second important characteristic to préserve is to represent solutions in a 
convenient way. We want to obtain a kind of solved form from which one can 
décide whether a System has solutions and one can "compute" them. 

In this chapter, we présent an automata-based algorithm for solving Systems 
of positive and négative set constraints where no projection symbols occurs. We 
define a new class of automata recognizing sets of (codes of) n-tuples of tree 
languages. Given a System of set constraints, there exists an automaton of this 
class which recognizes the set of solutions of the System. Therefore properties 
of our class of automata directly translate to set constraints. 

In order to introduce our automata, we discuss the case of unary symbols, 
ie.the case of strings over finite alphabet. For instance, let us consider the 
following constraints over the alphabet composed of two unary symbols a and 
b and a constant 0: 



XaaUXbbÇX (5.3) 

Y Ç X 

This System of set constraints can be encoded in a formula of the monadic 
second order theory of 2 successors named a and b: 

Vu (u G I =i> (uaa G X A ubb G X))A 
VuuEY^>u£X 

We hâve depicted in Fig 5.1 (a beginning of) an infinité tree which is a 
model of the formula. Each node corresponds to a string over a and b. The 
root is associated with the empty string; going down to the left concatenates a 
a; going down to the right concatenates a b. Each node of the tree is labelled 
with a couple of points. The two components correspond to sets X and Y . A 
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black point in the first component means that the current node belongs to X. 
Conversely, a white point in the first component means that the current node 
does not belong to X. Hère we hâve X = {e, aa, 66, ... } and Y = {e, 66, . . . }. 





io oo oo 



/\/\/\/\ 

oo oo oo oo oo oo oo oo 

A A 

Figure 5.1: An infinité tree for the représentation of a couple of word languages 
(X, Y). Each node is associated with a word. A black dot stands for belongs 
to. X = {s, aa, 66, ... } and Y = {e, 66, . . . }. 



A tree language that encodes solutions of Eq. 5.3 is Rabin-recognizable by 
a tree automaton which must avoid the three forbidden patterns depicted in 
Figure 5.2. 



.? 

A 

?? \ 


.? 

/\ 

/ ?? 


'\ 


/\ 



Figure 5.2: The set of three forbidden patterns. '?' stands for black or white 
dot. The tree depicted in Fig. 5.1 exclude thèse three patterns. 



Given a ranked alphabet of unary symbols and one constant and a System 
of set constraints over {X\, . . . , X n }, one can encode a solution with a {0, 1}"- 
valued infinité tree and the set of solutions is recognized by an infinité tree 
automaton. Therefore, decidability of satisfiability of Systems of set constraints 
can easily be derived from Rabin's Tree Theorem [Rab69] because infinité tree 
automata can be considered as an acceptor model for n-tuples of word languages 
over finite alphabet 2 . 

We extend this method to set constraints with symbols of arbitrary arity. 
Therefore, we define an acceptor model for mappings from T(JF), where T is a 
ranked alphabet, into a set E = {0, 1}™ of labels. Our automata can be viewed 
as an extension of infinité tree automata, but we will use weaker acceptance 
condition. The acceptance condition is: the range of a successful run is in a 
specified set of accepting set of states. We will prove that we can design an 



2 The entire class of Rabin's tree languages is not captured by solutions of set of words 
constraints. Set of words constraints define a class of languages which is strictly smaller than 
Buchi recognizable tree languages. 
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automaton which recognizes the set of solutions of a System of both positive 
and négative set constraints. For instance, let us consider the following System: 



Y <Z -L 
XÇ/(F,~X)Ua 



(5.4) 
(5.5) 



where _L stands for the empty set and ~ stands for the complément symbol. 

The underlying structure is différent than in the previous example since it is 
now the whole set of terms on the alphabet composed of a binary symbol / and 
a constant a. Having a représentation of this structure in mind is not trivial. 
One can imagine a directed graph whose vertices are terms and such that there 
exists an edge between each couple of terms in the direct subterm relation (see 
figure 5.3). 




f(a,f(a,a)) 



f(f(f(a,a),a),a) 



Figure 5.3: The (beginning of the) underlying structure for a two letter alphabet 



An automaton hâve to associate a state with each node following a finite set 
of rules. In the case of the example above, states are also couples of • or o. 

Each vertex is of infinité out-degree, nonetheless one can define as in the 
word case forbidden patterns for incoming vertices which such an automaton 
hâve to avoid in order to satisfy Eq. (5.5) (see Fig. 5.4, Pattern ? stands for 
o or •). The acceptance condition is illustrated using Eq. (5.4). Indeed, to 
describe a solution of the System of set constraints, the pattern ?• must occur 
somewhere in a successful "run" of the automaton. 



2° 



99 





Figure 5.4: Forbidden patterns for (5.5). 



Consequently, decidability of Systems of set constraints is a conséquence of 
decidability of emptiness in our class of automata. Emptiness decidability is 
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easy for automata without acceptance conditions (it corresponds to the case of 
positive set constraints only) . The proof is more difficult and technical in the 
gênerai case and is not presented hère. Moreover, and this is the main advantage 
of an automaton-based method, properties of recognizable sets directly translate 
to sets of solutions of Systems of set constraints. Therefore, we are able to prove 
nice properties. For instance, we can prove that a non empty set of solutions 
always contain a regular solution. Moreover we can prove the decidability of 
existence of finite solutions. 

5.2 Définitions and Examples 

Infinité tree automata are an acceptor model for infinité trees, i.e.for mappings 
from A* into E where A is a finite alphabet and E is a finite set of labels. We 
define and study .F-generalized tree set automata which are an acceptor model 
for mappings from T(T) into E where T is a finite ranked alphabet and E is a 
finite set of labels. 

5.2.1 Generalized Tree Sets 

Let T be a ranked alphabet and E be a finite set. An E-valued T -generalized 
tree set g is a mapping from TÇF) into E. We dénote by Ge the set of _E-valued 
^-generalized tree sets. 

For the sake of brevity, we do not mention the signature T which strictly 
speaking is in order in generalized tree sets. We also use the abbreviation GTS 
for generalized tree sets. 

Throughout the chapter, if c G {0, 1}™, then Ci dénotes the i th component 
of the tuple c. If we consider the set E = {0, 1}™ for some n, a generalized tree 
set g in G{o.i} n can be considered as a n-tuple (L\, . . . , L n ) of tree languages 
over the ranked alphabet T where Li = {t G T(T) | g(t)i = 1}. 

We will need in the chapter the following opérations on generalized tree sets. 
Let g (resp. g') be a generalized tree set in Ge (resp. Ge 1 )- The generalized 
tree set g j g' G GexE' is defined by g f g'(t) = (g(t),g'(t)), for each term t 
in T(T). Conversely let g be a generalized tree set in GexE' and consider the 
projection n from E x E' into the i5-component then Tr(g) is the generalized 
tree set in Ge defined by ir(g)(t) = w(g{t)). Let G Ç GexE' and G" Ç Ge-, then 
tt(G) = {ir(g) \geG} and tt" 1 ^') = {g G Gexe> | n(g) G G'}. 

5.2.2 Tree Set Automata 

A generalized tree set automaton A = (Q, A,f2) (GTSA) over a finite set 
E consist of a finite state set Q, a transition relation A Ç (J Q p x T p x E x Q 
and a set il Ç 2^ of accepting sets of states. 

A run of A (or ^4-run) on a generalized tree set g G Ge is a mapping 
r : T(T) -> Q with: 

(r(ti), . . . , r(t p )J, g(f(h, ..., t p )),r(f(t u ..., tp))) G A 

for £i, . . . , t p G T{!F) and / G T v . The run r is successful if the range of r is 
in n i.e.r(T{T)) G 9.. 
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A generalized tree set g G Qe is accepted by the automaton A if some run 
r of A on g is successful. We dénote by C(A) the set of -E-valued generalized 
tree sets accepted by a generalized tree set automaton A over E. A set G Ç Ç B 
is recognizable if G = C(A) for some generalized tree set automaton ^4. 

In the following, a rule (ci, . . . , q pi /, l, q) is also denoted by f(qi, ■ . ■ , <? p ) _i> <?. 
Consider a term £ = /(ii, . . . , t p ) and a rule f(qi, ■ ■ ■ , g p ) -A. q, tins rule can 
be applied in a run r on a generalized tree set g for the term i if r(ii) = 
<7i,. . . ,r(t p ) = q p , t is labeled by l, i.e.g{i) = l. If the rule is applied, then 
r(t)=q. 

A generalized tree set automaton A = (Q, A, fi) over E is 



• 



• 



deterministic if for each tuple (qi, ■ ■ ■ ,q p , f,l) G <5 P X T p x _E there is at 
most one state q G Q such that (qi, . . . , ç p , /, i, g) G A. 

strongly deterministic if for each tuple (qi, . . . ,q p , f) G Q p x T p there 
is at most one pair {l, q) G E x Q such that (gi, . . . , q p , f, l, q) G A. 



• complète if for each tuple (31, ... , g P , /, G Q p 'x T p x E there is at least 
one state q Ci Q such that (gi, . . . , q p , f, l, q) G A. 

• simple if f2 is "subset-closed" , that is w G fi =>■ (Va/ Çwoi'e fi). 

Successfulness for simple automata just implies some states are not assumed 
along a run. For instance, if the accepting set of a GTSA A is fi = 2^ then A is 
simple and any run is successful. But, if fi = {Q}, then A is not simple and each 
state must be assumed at least once in a successful run. The définition of simple 
automata will be clearer with the relationships with set constraints and the 
emptiness property (see Section 5.4). Briefly, positive set constraints are related 
to simple GTSA for which the proof of emptiness décision is straightforward. 
Another and équivalent définition for simple GTSA relies on the acceptance 
condition: a run r is successful if and only if r(T(F)) Ç ui G fi. 

There is in gênerai an infinité number of runs — and hence an infinité 
number of GTS recognized — even in the case of deterministic generalized tree 
set automata (see example 49.2). Nonetheless, given a GTS 5, there is at most 
one run on g for a deterministic generalized tree set automata. But, in the case 
of strongly deterministic generalized tree set automata, there is at most one run 
(see example 49.1) and therefore there is at most one GTS recognized. 



Example 49. 

Ex. 49.1 Let E = {0,1}, T = {cons(, ), s(), nil, 0}. Let A = (Q, A, fi) be 

defined by Q = {Nat, List, Term}, fi = 2®, and A is the following set of 

rules: 

0_^Nat ; s(Nat)_^Nat ; nil i> List ; 

cons(Nat, List) 2^ List ; 

cons(g, q') _2> Term V(ç, q') ^ (Nat, List) ; 

s(ç)_£>Term \/q ^ Nat . 

A is strongly deterministic, simple, and not complète. C{A) is a singleton 
set. Indeed, there is a unique run r on a unique generalized tree set g G 
5{o.i} n - The run r maps every natural number on state Nat, every list on 
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state List and the other terms on state Term. Therefore g maps a natural 
number on 0, a list on 1 and the other terms on 0. Hence, we say that C(A) 
is the regular tree language L of Lisp-like lists of natural numbers. 

Ex. 49.2 Let E = {0,1}, F = {cons(,),s(),nil,0}, and let A! = (Q',A',Cl') 
be defined by Q' = Q, Cl' = Cl, and 

A' = A U {cons(Nat, List) _£> List, nil _^ List}. 

A' is deterministic (but not strongly), simple, and not complète, and C(A') 
is the set of ail subsets of the regular tree language L of Lisp-like lists of 
natural numbers. Indeed, successful runs can now be defined on generalized 
tree sets g such that a term in L is labeled by or 1. 

Ex. 49.3 Let E = {0, l} 2 , T = {cons(, ), s(), nil, 0}, and let A = {Q, A, Cl) 
be defined by Q = {Nat, Nat', List, Term}, Cl = 2 Q , and A is the following 
set of rules: 



O^Nat ; O^Nat' 



•S 



a(Nat) ^ Nat 
s(Nat') ( ^ 0) Nat' 



(Nat) ^ Nat'; s(Nat') ( ^> Nat 
nil ^) List ; cons(Nat', List) <■?$ List 
5 (ç)(^0)Term Vç ^ Nat 
cons(g, q') ^ Term V(g, q') / (Nat', List) 

A is deterministic, simple, and not complète, and C(A) is the set of 2-tuples 
of tree languages (N' , L') where N' is a subset of the regular tree language 
of natural numbers and L' is the set of Lisp-like lists of natural numbers 
over N'. 

Let us remark that the set N' may be non-regular. For instance, one can 
define a run on a characteristic generalized tree set g p of Lisp-like lists of 
prime numbers. The generalized tree set g p is such that g p (t) = (1, 0) when 
t is a (code of a) prime number. 

In the previous examples, we only consider simple generalized tree set au- 
tomata. Moreover ail runs are successful runs. The following examples are 
non-simple generalized tree set automata in order to make clear the interest of 
acceptance conditions. For this, compare the sets of generalized tree sets ob- 
tained in examples 49.3 and 50 and note that with acceptance conditions, we 
can express that a set is non empty. 



Example 50. Example 49.3 continued 

Let E = {0, l} 2 , T = {cons(,),nil,s(),0}, and let A' = {Q',A',Cl') be 
defined by Q' = Q, A' = A, and Cl' = {lu e 2 Q Nat' e u>}. A' is deterministic, 
not simple, and not complète, and C(A') is the set of 2-tuples of tree languages 
(N' , L') where N' is a subset of the regular tree language of natural numbers 
and L' is the set of Lisp-like lists of natural numbers over N', and N' ^ 0. 
Indeed, for a successful r on g, there must be a term t such that r(t) = Nat 
therefore, there must be a term t labelled by (1, 0), henceforth N' ^ 0. 
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5.2.3 Hierarchy of GTSA-recognizable Languages 

Let us define: 

• 7?-gtS; the class of languages recognizable by GTS A, 

• 7?-dgts? the class of languages recognizable by deterministic GTS A, 

• 7^-SGTSj the class of languages recognizable by Simple GTS A. 

The three classes defined above are proved to be différent. They are also 
closely related to classes of languages defined from the set constraint theory 
point of view. 




Figure 5.5: Classes of GTSA-recognizable languages 



Classes of GTSA-recognizable languages hâve also différent closure prop- 
erties. We will prove in Section 5.3.1 that TZsgts all d the entire class 7?.gts 
are closed under union, intersection, projection and cylindrification; 7?-dgts is 
closed under complémentation and intersection. 

We propose three examples that illustrate the différences between the three 
classes. First, T^dgts is not a subset of 7?-sgts- 



Example 51. Let E = {0, 1}, T = {/, a} where a is a constant and / is unary. 
Let us consider the deterministic but non-simple GTSA Ai = ({<7o, <?i}, Ai, fli) 
where Ai is: 



a A go, 


a±>qi, 


f(qo) A 90, 


f(qi)^qo 


/(ç )i»<7i, 


f(qi)\qo 



and Oi = {{<îo, Ci}, {?i}}- Let us prove that 

C{Ai) = {L | L + 0} 

is not in 7?-sgts- 

Assume that there exists a simple GTSA A s with n states such that C{A\) = 
C{A S ). Hence, A s recognizes also each one of the singleton sets {f l (a)} for i > 0. 
Let us consider some i greater than n + 1, we can deduce that a run r on the 
GTS g associated with {f z {a}} maps two terms f k (a) and /'(a), k < l < i to 
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the same state. We hâve g(t) = for every term t < f (a) and r "loops" between 
f k (a) and f l {a). Therefore, one can build another run rç, on a GTS go such 
that go(t) = for each t G T(F). Since A s is simple, and since the range of tq 
is a subset of the range of r, go is recognized, hence the empty set is recognized 
which contradicts the hypothesis. 

Basically, using simple GTSA it is not possible to enforce a state to be 
assumed somewhere by every run. Consequently, it is not possible to express 
global properties of generalized tree languages such as non-emptiness. 

Second, "fësGTS is not a subset of 7?-dgts- 

Example 52. Let us consider the non-deterministic but simple GTSA A% = 
{{Qf,Qh}, A 2 , ÇI2) where A 2 is: 

a\q f \q h , a i<7/ I Qh, 

f(lf) -i Qf I qh, h(q h ) i» qf | qh, 
f{q h ) -2» <?/ | qh, h(qf) i> g/ | q h , 

and 2 = 2{«/-«">. It is easy to prove that C(A 2 ) = {L \\/t f{t) Ei» h(t) G" 
L}. The proof that no deterministic GTSA recognizes C{A 2 ) is left to the reader. 

We terminate with an example of a non-deterministic and non-simple gen- 
eralized tree set automaton. This example will be used in the proof of Proposi- 
tion 36. 

Example 53. Let A = (Q, A, Q) be defined by Q = {q 7 q'}, O = {Q} 7 and A is 
the following set of rules: 

al+q ;al>q' ;a^q' ;f{q)l+q; 

f(q')^q';f(q')±q';f(q')±q; 

The proof that A is not deterministic, not simple, and not complète, and 
C(A) = {LC T{T) | 3* 6 T{F) {{t G L) A (W g T{T) {t < t') =► (f G L)))} is 
left as an exercise to the reader. 



5.2.4 Regular Generalized Tree Sets, Regular Runs 

As we mentioned it in Example 49.3, the set recognized by a GTSA may contain 
GTS corresponding to non-regular languages. But regularity is of major interest 
for practical reasons because it implies a GTS or a language to be finitely defined. 
A generalized tree set g G Ge is regular if there exist a finite set R, a 
mapping a : T(T) — » R, and a mapping j3 : R — ► i? satisfying the following two 
properties. 

1. g = a(3 (Le. g = j3 o a), 
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2. a is closed under contexts, i.e.îor ail context c and terms t±, t 2 , we hâve 
(a(ii) = a(t 2 )) =» (a(c[ii]) = o(c[t 2 ])) 

In the case E = {0, 1}", regular generalized tree sets correspond to n-tuples 
of regular tree languages. 

Although the définition of regularity could lead to the définition of regular 
run — because a run can be considered as a generalized tree set in Qq, we use 
stronger conditions for a run to be regular. Indeed, if we define regular runs 
as regular generalized tree sets in Gq, regularity of generalized tree sets and 
regularity of runs do not correspond in gênerai. For instance, one could define 
regular runs on non-regular generalized tree sets in the case of non-strongly de- 
terministic generalized tree set automata, and one could define non-regular runs 
on regular generalized tree sets in the case of non-deterministic generalized tree 
set automata. Therefore, we only consider regular runs on regular generalized 
tree sets: 

A run r on a generalized tree set g is regular if r î g G GexQ 
is regular. Consequently, r and g are regular generalized tree sets. 

Proposition 33. Let A be a generalized tree set automaton, if g is a regular 
generalized tree set in C(A) then there exists a regular A-run on g. 

Proof Consider a generalized tree set automaton A = (Q, A, Cl) over E and a 
regular generalized tree set g in C(A) and let r be a successful run on g. Let 
L be a finite tree language closed under the subterm relation and such that 
Tq Ç L and r(L) = r(T{F)). The generalized tree set g is regular, therefore 
there exist a finite set R, a mapping a : T(T) — > R closed under context and a 
mapping (3 : R — > E such that g = a[3. We now define a regular run r' on g. 

Let L+ = L U {*} where 7k- is a new constant symbol and let (f> be the mapping 
from T(J 7 ) into Q x Rx L+ defined by </>(£) = (r(t), a{t), u) where u = t if t G L 
and u = * otherwise. Hence R' = 4>{T(T)) is a finite set because R' Ç QxRxL*. 
For each p in R' , let us fix t p G T(T) such that 4>{t p ) = p. 

The run r' is now (regular ly) defined via two mappings a 1 and /3'. Let (3' be 
the projection from Q x R x L+ into Q and let a 1 : T(T) — > R 1 be inductively 
defined by: 



and 



Va G Tq a' (a) = 4>{a)\ 

VfeF p Vt 1 ,...,t p eT(F) 

a' (/(il, . . .,t p )) = 0(/(i a '(ti), ■ • ■ ,ta'(t p )))- 

Let r' = a' P' . First we can easily prove by induction that \/t e L a' {t) = cp{t) 
and deduce that \/t G L r'(t) = r(t). Thus r' and r coincide on L. It remains 
to prove that (1) the mapping a' is closed under context, (2) r' is a run on g 
and (3) r' is a successful run. 

(1) From the définition of a' we can easily dérive that the mapping a' is closed 
under context. 
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(2) We prove that the mapping r' = a' /3' is arun on g, that is if t = f(ti, . . . , t p ) 
then(r'(i 1 ),...,r%),/, 9 (i),r'W)eA. 

Let us consider a terni t = f{t\, . . . , t p ). From the définitions of a', /3', and 
r', we get r'(t) = rit') with t' = f{t a ^ tl) ,. . .,t a ^ tp) ). The mapping r is a run 
on g, hence (r(£ Q , (tl) ), . . . , r(i a , (tp) ), /, g(t'),r(t')) G A, and thus it suffices to 
prove that g(t) = g(t') and, for ail î, r'(ij) = r(t a >/ t .\). 

Let i G { 1 , . . . , p] , r'(ti) = f3'(a'(ti)) by définition of r' . By définition of t a > fc) > 
a'(ti) = <p{t a i(ti))i therefore r' (ti) = (3 1 {4>{t a '{ti)))- Now, using the définitions 
of <p and j3' , we get r'(ti) = r(t a >nA). 

In order to prove that g(t) = g(t'), we prove that a(t) = a(t'). Let 7r be 
the projection from R' into R. We hâve a(t') = 7r(</>(£')) by définition of 
(f> and 7r. We hâve a{t') = n(a'(t)) using définitions of t' and a'. Now 
a{t') = Tr{4>(t a ^ t ))) because 4>{t a i^) = a'(t) by définition of t a ,^ t y And then 
a(t') = a(t a iu\) by définition of 7r and (j>. Therefore it remains to prove that 
a (ta'(t)) — ot[t). The proof is by induction on the structure of ternis. 

If t G !Fq then t a in) = t, so the property holds (note that this property holds 
for ail t G L). Let us suppose that t = f(t±, . . . ,t p ) and a(t a iuA) = a(U) Vi G 
{1, . . . ,p}. First, using induction hypothesis and closure under context of a, 
we get 



a(f(ti,...,t p )) = a(/(t i'(ti),--- î *a'(t,))) 



Therefore, 



a(f(ti,...,t p )) = a(/(io,'( tl ),.. • ,*a'(t p ))) 

= T(0(/(*a'(ti)>" •>*<*'(**)))) ( def. of^andTr) 

= 7 r(a'(/(t 1 ,...,t p )))(def. ofa') 

= ""(^(^'(/(ti,...,^)))) ( def. of *«'(/(*!,...,*„))) 

= «(^'(/(ti,...,^))) ( del - OI and tt)- 

(3) We hâve r'(T(^")) = r'(L) = r(L) = r{T{T)) using the définition of r', the 
définition of L, and the equality r' [L) = r(L). The run r is a successful run. 
Consequently r' is a successful run. 

□ 

Proposition 34. A non-empty recognizable set of generalized tree sets contains 
a regular generalized tree set. 

Proof. Let us consider a generalized tree set automaton A and a successful run 
r on a generalized tree set g. There exists a tree language closed under the 
subterni relation F such that r(F) = r(T(^-")). We define a regular run rr on a 
regular generalized tree set gg in the following way. 

The run rr coincides with r on F: Vi G F, rr(t) = r(t) and gg(t) = g(t). The 
runs rr and gg are inductively defined on T(T) \F: given qi, . . . , q p in r{T(T)), 
let us fix a rule /(<7i, . . . , q p ) _i> q such that g G r(T(!F)). The rule exists since 
r is a run. Therefore, Vi = /(ii, . . . , i p ) ^ i 71 such that rr(ti) = qi for ail i < p, 
we define rr(t) = q and gg(£) = l, following the fixed rule /(<?i, . . . , q p ) -i> g. 

□ 
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From the preceding, we can also deduce that a finite and recognizable set of 
generalized tree sets only contains regular generalized tree sets. 

5.3 Closure and Décision Properties 

5.3.1 Closure properties 

This section is dedicated to the study of classical closure properties on GTSA- 
recognizable languages. For ail positive results — union, intersection, projec- 
tion, cylindrification — the proofs are constructive. We show that the class of 
recognizable sets of generalized tree sets is not closed under complémentation 
and that non-determinism cannot be reduced for generalized tree set automata. 
Set opérations on sets of GTS hâve to be distinguished from set opérations 
on sets of ternis. In particular, in the case where E = {0, 1}™, if G\ and Gi are 
sets of GTS in Qe, then G\ UG2 contains ail GTS in G\ and G 2 - This is clearly 
différent from the set of ail (L\ U L\ , . . . ,L\ U L^) where (L\, . . . , L^) belongs 
to G\ and (L\ , . . . , L 2 n ) belongs to G2. 

Proposition 35. The class 7?-gts * s closed under intersection and union, i.e.if 
G\, G2 Ç Qe are recognizable, then G\ U G2 and G\ H G2 are recognizable. 

This proof is an easy modification of the classical proof of closure properties 
for tree automata, see Chapter 1. 

Proof. Let A\ = (Qi, Ai,fii) and A2 = (Q2) ^2,^2) be two generalized tree 
set automata over E. Without loss of generality we assume that Q\ H Q2 = 0- 

Let A = (Q, A, Q.) with Q = Q x U Q 2 , A = Ai U A 2 , and = fii U fl 2 - It is 
immédiate that C(A) = C{A\) U C{A 2 ). 

We dénote by 7Ti and n2 the projections from Q\ x Q2 into respectively Q\ 
and Q 2 . Let A' = (Q', A', Q') with Q' = Q 1 x Q 2 , A' is defined by 

(/(<&, . . . , q P ) J, q G A') O (Vi € {1, 2} /(^(çi), . . . , Tn(q p )) J, n(q) G A,) , 
where q\, . . . , q p , q € Q' , / G .Fp, / G S, and fl' is defined by 

Çl' = {uj G 2 Q ' | 7Ti(w) Gfî,,!G {1,2}}. 

One can easily verify that C(A') = C(Ai) fl £(^2) • 

□ 

Let us remark that the previous constructions also prove that the class 7?-sGTS 
is closed under union and intersection. 

The class languages recognizable by deterministic generalized tree set au- 
tomata is closed under complémentation. But, this property is false in the 
gênerai case of GTSA-recognizable languages. 

Proposition 36. (a) Let A be a generalized tree set automaton, there exists 
a complète generalized tree set automaton A c such that C(A) = C(A C ). 

(b) IfAcd is a deterministic and complète generalized tree set automaton, there 
exists a generalized tree set automaton A 1 such that C(A') = Qe — £(-4cd)- 
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(c) The class of GTSA-recognizable languages is not closed under complémen- 
tation. 

(d) Non-determinism can not be reduced for generalized tree set automata. 

Proof. (a) Let A = (Q, A, fi) be a generalized tree set automaton over E and let 
q' be a new state, i.e.q' g" Q. Let A c = (Q c , A c , Q, c ) be defined by Q c = QU{q'}, 
Q r = Cl. and 



A c = A U {( gi , ...,q p ,f, l, q') | {(qi, . . . , q p , f, l)} X Q n A = 0; 

qi,...,q p € Qc, f € T p ,l G E}. 

A c is complète and C{A) = C{A C ). Note that A c is simple if A is simple. 

(b) A c d = {Q, A, Q) be a deterministic and complète generalized tree set 
automaton over E. The automaton A' = (Q',A',f2') with Q' = Q, A' = A, 
and fl' = 2*2 — Q recognizes the set Ge — £(-A c d)- 

(c) E = {0,1}, T = {c,a} where a is a constant and c is of arity 1. Let 

G = {g e <? {0 , 1} » | 3* e T{F) ((g(t) = 1) a (Vf e T(.F) (< < f) ^ (g(t') = 1)))}. 

Clearly, G is recognizable by a non deterministic GTS A (see Example 53). Let 
G = £{o,i}" - G, we hâve G = {g G 5{o,i}" I Vt G T(j r ) 3t' G T(^) (i < f') A 
(g(t') = 0)} and G is not recognizable. Let us suppose that G is recognized 
by an automaton A = (Q, A, Q) with Card(Q) = k — 2 and let us consider the 
generalized tree set g defined by: g(c l (a)) = 0iii = kxz for some integer z, 
and g(c l (a)) = 1 otherwise. The generalized tree set g is in G and we consider 
a successful run r on g. We hâve r{T(T)) = lu G fl therefore there exists some 
integer n such that r({g(c l (a)) \ i < n}) = lu. Moreover we can suppose that n 
is a multiple of k. As Card(Q) = k — 2 there are two ternis u and v in the set 
{c l {a) | n+ 1 < i < n+k— 1} such that r(u) = r(v). Note that by hypothesis, for 
ail i such that n+1 < i < n + fc + 1, <7(c l (a)) = 1. Consequently, a successful run 
g' could be defined from g on the generalized tree set g' defined by g'(t) = g(t) 
if t = c l (a) when i < n, and g'(t) = 1 otherwise. This leads to a contradiction 
because g' g" G. 

(d) This resuit is a conséquence of (b) and (c). 

D 

We will now prove the closure under projection and cylindrification. We will 
first prove a stronger lemma. 

Lemma 8. Let G C Qe x be a GTSA-recognizable language and let R Ç E\ x E%. 
The set R(G) = {g 1 G Qe 2 \ 3ff G G \ft G T{T) (g{t),g'(t)) G R} is recognizable. 

Proof. Let A = (Q,A,Q) such that C{A) = G. Let A' = (Q',A',0') where 
Q' = Q, A> = {f(q l7 . . . ,q p ) £+q \ 31 e E! f(q l7 . . . ,q p ) \q e A and (1,1') e R} 
and fl' = fl. We prove that R(G) = C(A'). 

D Let g' G C(A') and let r' be a successful run on r/. We construct a generalized 
tree set g such that for ail t G T(!F), (g(t), g'(t)) G -R and such that r' is 
also a successful .4-run on g. 
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Let a be a constant. According to the définition of A', a 9 J^:'r'(a) G A' 
implies that there exists l a such that (l a ,g'{a)) G R and al^r'(a) G A. 
So let g(a) = l a . 

Let t = /(il, • • • ,t p ) with Vi r' (£j) = qi- There exists a rule f(qi, ■ ■ ■ , q p ) 9 _i» f'(t) 
in A' because r' is a run on g' and again, from the définition of A', there 
exists l t G Ei such that f(q x , . . . ,q p ) h>r'(t) in A with (l t (t) , g' (t)) G R. 
So, we define g{t) = l t . Clearly, g is a generalized tree set and r' is a 
successful run on g and for ail £ G T(.F), (g(t), g'(t)) G -R by construction. 

Ç Let g' G R(G) and let g G G such that Vi G T(F) (g(t),g'(t)) G iî. One can 
easily prove that any successful ,4-run on g is also a successful „4'-run on 

g'- 

D 

Let us recall that if g is a generalized tree set in Qe 1 x---xE n , the ith projection 
of g (on the £i-component, 1 < i < n) is the GTS 7Ti(<?) defined by: let 7r from 
E\ x • • • x I? n into -Ej, such that 7r(Zi , . . . ,l n ) = li and let iTi(g)(t) = ir(g(t)) for 
every term t. Conversely, the ith cylindrification of a GTS g denoted by ir~ (g) 
is the set of GTS g' such that Ki{g') = g. Projection and cylindrification are 
usually extended to sets of GTS. 

Corollary 7. (a) The class of GTSA-recognizable languages is closed under 
projection and cylindrification. 

(b) Let G Ç Q E and G' Ç Ç E , be two GTSA-recognizable languages. The set 
G ] G' = {g 1 g' \ g G G, g' G G'} is a GTSA-recognizable language in 

GexE' ■ 

Proof. (a) The case of projection is an immédiate conséquence of Lemma 8 
using Ei = E x E', Ei = E, and R = ir where 7r is the projection from 
E x E' into E. The case of cylindrification is proved in a similar way. 

(b) Conséquence of (a) and of Proposition 35 because G | G' = 7rj~ (G) H 
^2 (G') where 7rj~ (respectively ir^ ) is the inverse projection from E to 
E x E' (respectively from E' to E x E 1 ). 
Let us remark that the construction préserves simplicity, so 7^-sgts is closed 
under projection and cylindrification. 

D 

We now consider the case E = {0, 1}™ and we give two propositions without 
proof. Proposition 37 can easily be deduced from Corollary 7. The proof of 
Proposition 38 is an extension of the constructions made in Examples 49.1 and 
49.2. 

Proposition 37. Let A and A 1 be two generalized tree set automata over 
{0,1}™. 

(a) {(L 1 UL' 1 ,...,L n UL' n ) | (L u ...,L n ) G C(A) and (L[, ...,L' n ) G L(A')} 

is recognizable. 

(b) {(Li flii,..., L n n L' n ) | (L u ..., L n ) G C{A) and (L[, ..., L' n ) G C(A')} 
is recognizable. 
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(c) {(Li, . . . ,L n ) | (Li, . . . , L n ) G C(A)} is recognizable, where Li = T(T) — 
L h V*. 

Proposition 38. Let E = {0, 1}™ and let (i*i, . . . , F n ) be a n-tuple of regular 
tree languages. There exist deterministic simple generalized tree set automata 
A , A! ' ) and A" such that 

. C(A)={(F 1 ,...,F n )} ! 

• C(A') = {{L 1 ,...,L n ) \L 1 ÇF 1 ,..,,L n CF n }; 

• C(A") = {{L 1 ,...,L n ) \F 1 ÇL 1 ,...,F n CL n }. 

5.3.2 Emptiness Property 

Theorem 44. The emptiness property is decidable in the class of generalized 
tree set automata. Given a generalized tree set automaton A, it is decidable 
whether C(A) = 0. 

Labels of the generalized tree sets are meaningless for the emptiness déci- 
sion thus we consider "label-free" generalized tree set automata. Briefly, the 
transition relation of a "label-free" generalized tree set automata is a relation 
A Ç U p Q p x T p x Q. 

The emptiness décision algorithm for simple generalized tree set automata 
is straightforward. Indeed, Let w be a subset of Q and let COND(cj) be the 
following condition: 

Vp V/ G T p Vgi, . . . , q p G w 3g £ u (qi , . . . , q p , f, q) G A 

We easily prove that there exists a set io satisfying COND(w) if and only if 
there exists an ,4-run. Therefore, the emptiness problem for simple generalized 
tree set automata is decidable because 2*^ is finite and COND(oj) is decidable. 
Decidability of the emptiness problem for simple generalized tree set automata 
is NP-complete (see Prop. 39). 

The proof is more intricate in the gênerai case, and it is not given in this 
book. Without the property of simple GTS A, we hâve to deal with a reachability 
problem of a set of states since we hâve to check that there exists lu G fl and a 
run r such that r assumes exactly ail the states in lu. 

We conclude this section with a complexity resuit of the emptiness problem 
in the class of generalized tree set automata. 

Let us remark that a finite initial fragment of a "label-free" generalized tree 
set corresponds to a finite set of terms that is closed under the subterm relation. 
The size or the number of nodes in such an initial fragment is the number of 
différent terms in the subterm-closed set of terms (the cardinality of the set of 
terms). The size of a GTS A is given by: 

MII = IQI+ E (««*!/(/) + 3) + X>l- 

f( qi ,...,q p )2+qEA ^ fi 

Let us consider a GTSA A with n states. The proof shows that one must 
consider at most ail initial fragments of runs — hence corresponding to finite 
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tree languages closed under the subterm relation — of size smaller than B(A), 
a polynomial in n, in order to décide emptiness for A. Let us remark that the 
polynomial bound B(A) can be computed. The emptiness proofs relies on the 
following lemma: 

Lemma 9. There exists a polynomial function f of degree 4 such that: 

Let A = (Q, A, fi) be a GTSA. There exists a successful run r s such 
that r s (T(T)) = lo G fi if and only if there exists a run r m and a 
closed tree language F such that: 

• r m {T(T)) = r m {F)=Lo; 

• Card(_F) < f(n) where n is the number of states in to. 

Proposition 39. The emptiness problem in the class of (simple) generalized 
tree set automata is NP-complete. 

Proof. Let A = (Q, A, fi) be a generalized tree set automaton over E. Let 
n = Card(Q). 

We first give a non-deterministic and polynomial algorithm for deciding 
emptiness: (1) take a tree language F closed under the subterm relation such 
that the number of différent terms in it is smaller than B(A); (2) take a run 
r on F; (3) compute r(F); (4) check whether r(F) = lo is a member of fi; (5) 
check whether lu satisfies CONDfw). 

From Theorem 44, this algorithm is correct and complète. Moreover, this 
algorithm is polynomial in n since (1) the size of F is polynomial in n: step 
(2) consists in labeling the nodes of F with states following the rules of the 
automaton - so there is a polynomial number of states, step (3) consists in 
collecting the states; step (4) is polynomial and non-deterministic and finally, 
step (5) is polynomial. 

We reduce the satisfiability problem of boolean expressions into the empti- 
ness problem for generalized tree set automata. We first build a generalized 
tree set automaton A such that L(A) is the set of (codes of) satisfiable boolean 
expressions over n variables {xi, ...,!„}. 

Let T = Tq \JT\ \JTi where Tç, = {xi, . . . , x n }, T\ = {->}, and Ti = {A, V}. 
A boolean expression is a term of T(JF). Let Bool = {0, 1} be the set of boolean 
values. Let A = (Q,A,fi), be a generalized tree set automaton such that 
Q — {<Z0i 9i}i fi = % an( i A is the following set of rules: 

Xj _^ qi where j G {1, . . . , n} and % G Bool 
~'(qi) ^Xq^i where i G Bool 
V{qii,qi 2 )' il ^ 2< liiVi2 where i 1 ,i 2 G Bool 
A(gi 1 ,gta)* 1 A ,2 etiAi a where i 1; i 2 G Bool 

One can easily prove that L(A) = {L v | v is a valuation of {x%, . . . , x n }} 
where L v = {t\t is a boolean expression which is true under v}. L v corresponds 
to a run r v on a GTS g v and g v labels each Xj either by or 1. Hence, g v can 
be considered as a valuation v of Xi, . . . , x n . This valuation is extended in g v to 
every node, that is to say that every term (representing a boolean expression) 
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is labeled either by or 1 accordingly to the usual interprétation of -i, A, V. A 
given boolean expression is hence labeled by 1 if and only if it is true under the 
valuation v. 

Now, we can dérive an algorithm for the satisfiability of any boolean expres- 
sion e: build A e a generalized tree set automaton such that C(A) is the set of 
ail tree languages containing e: {L \ e G L}; build A e (^A and décide emptiness. 

We get then the réduction because A e fl A is empty if and only if e is not 
satisfiable. 

Now, it remains to prove that the réduction is polynomial. The size of A 
is 2 * n + 10. The size of A e is the length of e plus a constant. So we get the 
resuit. D 

5.3.3 Other Décision Results 

Proposition 40. The inclusion problem and the équivalence problem for deter- 
ministic generalized tree set automata are decidable. 

Proof. Thèse results are a conséquence of the closure properties under inter- 
section and complémentation (Propositions 35, 36), and the decidability of the 
emptiness property (Theorem 44). D 

Proposition 41. Let A be a generalized tree set automaton. It is decidable 
whether or not C{A) is a singleton set. 

Proof. Let A be a generalized tree set automaton. First it is decidable whether 
C{A) is empty or not (Theorem 44). Second if C(A) is non empty then a regular 
generalized tree set g in C{A) can be constructed (see the proof of Theorem 
44). Construct the strongly deterministic generalized tree set automaton A' 
such that C(A') is a singleton set reduced to the generalized tree set g. Finally, 
build A fl A to décide the équivalence of A and A'. Note that we can build A , 
since A' is deterministic (see Proposition 36). D 

Proposition 42. Let L = (L±, . . . , L n ) be a tuple of regular tree language and 
let A be a generalized tree set automaton over {0,1}' 1 . It is decidable whether 
LeC{A). 

Proof. This resuit just follows from closure under intersection and emptiness 
decidability. 

First construct a (strongly deterministic) generalized tree set automaton Al 
such that L(A) is reduced to the singleton set {L}. Second, construct A n Al 
and décide whether L(A.C\ Al) is empty or not. D 

Proposition 43. Given a generalized tree set automaton over E = {0,1,}™ 
and I Ç {1, . . . , n}. The following two problems are decidable: 

1. It is decidable whether or not there exists (L\, . . . , L n ) in C{A) such that 
ail the Li are finite for i S I. 

2. Let x\...,x n be natural numbers. It is decidable whether or not there 
exists (ii, . . . , L n ) in C(A) such that Card(Li) = Xi for each i G I. 

The proof is technical and not given in this book. It relies on Lemma 9 of 
the emptiness décision proof. 
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5.4 Applications to Set Constraints 

In tins section, we consider the satisfiability problem for Systems of set con- 
straints. We show a décision algorithm using generalized tree set automata. 

5.4.1 Définitions 

Let T be a finite and non-empty set of function symbols. Let X be a set of 
variables. We consider spécial symbols T, _L, ~, U, H of respective arities 0, 0, 
1, 2, 2. A set expression is a term in Tjr,(X) where T' = T U {T, _L, ~, U, fi}. 

A set constraint is either a positive set constraint of the form e Ç e' or a 
négative set constraint of the form e % e' (or -i(e Ç e')) where e and e' are set 
expressions, and a System of set constraints is defined by /\ i=1 SCi where the 
SCi are set constraints. 

An interprétation T is a mapping from A 1 into 2 T ^> . It can immediately be 
extended to set expressions in the following way: 

2(T)=T(T); 
1(±) = 0; 
I(/(ei, . . . , e p )) = /(X(ei), . . . ,I(e p )); 
I(~e)=T(^)\I(e); 
I(eUe') =î(e)UÎ(e'); 
J(ene') =î(e)fll(e'). 

We deduce an interprétation of set constraints in Bool = {0,1}, the Boolean 
values. For a System of set constraints SC, ail the interprétations X such that 
T(SC) = 1 are called solutions of SC. In the remainder, we will consider 
Systems of set constraints of n variables X\ , . . . , X n . We will make no distinction 
between a solution X of a System of set constraints and a n-tuple of tree languages 
(X(Xi), . . . ,T{X n )). We dénote by SOL(S'C) the set of ail solutions of a System 
of set constraints SC. 

5.4.2 Set Constraints and Automata 

Proposition 44. Let SC be a System of set constraints (respectively of positive 
set constraints) of n variables X\ , . . . , X n . There exists a deterministic (respec- 
tively deterministic and simple) generalized tree set automaton A over {0,1}™ 
such that C(A) is the set of characteristic generalized tree sets of the n-tuples 
[L\ , . . . , L n ) of solutions of SC . 

Proof First we reduce the problem to a single set constraint. Let SC = C\ A 
... A Cfc be a System of set constraints. A solution of SC satisfies ail the 
constraints Ci. Let us suppose that, for every i, there exists a deterministic 
generalized tree set automaton Ai such that SOL(Ci) = C(A). As ail variables in 
{Xi, . . . , X n } do not necessarily occur in Ci, using Corollary 7, we can construct 
a deterministic generalized tree set automaton A™ over {0, 1}™ satisfying: C(A™) 
is the set of {Ly, . . . , L n ) which corresponds to solutions of Ci when restricted 
to the variables of Ci. Using closure under intersection (Proposition 35), we can 
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construct a deterministic generalized tree set automaton A over {0,1}™ such 
that SOL(SC) = C{A). 

Therefore we prove the resuit for a set constraint SC of n variables X\ , . . . , X n 
Let £{exp) be the set of set variables and of set expression exp with a root sym- 
bol in T which occur in the set expression exp: 



£{exp) = {exp' G Tpi(X) \ exp' < exp and such that 

either TLead(exp) G T or exp 1 G A"}. 

lï SC = expi Ç expi or SC = exp\ $Z exp2 then £{SC) = £(expi)U£(exp2)- 

Let us consider a set constraint SC and let ip be a mapping (/? from £(SC) 

into Bool. Such a mapping is easily extended first to any set expression occurring 

in SC and second to the set constraint SC. The symbols U, H, ~, Ç and % are 

respectively interpreted as V, A, -i, => and -i =>•. 

We now define the generalized tree set automaton A = (Q, A, Q) over E = 
{0,1}". 

• The set of states is Q is the set {tp \ tp : £{SC) — y Bool}. 

• The transition relation is defined as follows: f{tfi, ■ ■ • , <p p ) A+ tp G A where 
ipi, . .. ,(p p G Q, f G F p , l = (h, . . . ,l n ) G {0, 1}™, and tp G Q satisfies: 

yie{l,...,n}tp(Xi) = k (5.6) 

Ve G £(SC) \ X fo(e) = 1) ^ ( J= ^V<^ (ei) = ! ) (5-7) 

• The set of accepting sets of states Q is defined depending on the case of a 
positive or a négative set constraint. 

- If SC is positive, tl = {uj G 2 e ? | Vtp G u ip(SC) = 1}; 

- If SC is négative, = {uj G 2 Q | 3tp G w <^(5C) = 1}. 

In the case of a positive set constraint, we can choose the state set Q = {tp \ 
tp(SC) = 1} and ft = 2®. Consequently, A is deterministic and simple. 

The correetness of this construction is easy to prove and is left to the reader. 

□ 

5.4.3 Decidability Results for Set Constraints 

We now summarize results on set constraints. Thèse results are immédiate 
conséquences of the results of Section 5.4.2. We use Proposition 44 to encode 
sets of solutions of Systems of set constraints with generalized tree set automata 
and then, each point is deduced from Theorem 44, or Propositions 38, 43, 40, 
41. 
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Properties on sets of solutions 

Satisfiability The satisfiability problem for Systems of set constraints is decid- 
able. 

Regular solution There exists a régulai' solution, that is a tuple of regular 
tree languages, in any non-empty set of solutions. 

Inclusion, Equivalence Given two Systems of set constraints SC and SC, it 
is decidable whether or not SOL(SC) Ç SOL(SC'). 

Unicity Given a System SC of set constraints, it is decidable whether or not 
there is a unique solution in SOL(S'C). 

Properties on solutions 

flxed cardinalities, singletons Given a System SC of set constraints over 
(Xi, . . . , X n ), I Ç {1, . . . , n}, and X\ . . . ,x n natural numbers; 

• it is decidable whether or not there is a solution (L\, . . . ,L n ) € 
SOL(S'C) such that Card(Li) = x% for each i G I. 

• it is decidable whether or not ail the Li are finite for i G I. 
In both cases, proofs are constructive and exhibits a solution. 

Membership Given SC a System of set constraints over (Xi, . . . ,X n ) and a 
n-tuple (Li, . . . , L n ) of regular tree languages, it is decidable whether or 
not (Li,...,L„) gSOL(SC). 

Proposition 45. Let SC be a System of positive set constraints, it is decidable 
whether or not there is a least solution in SOL (S C). 

Proof. Let SC be a System of positive set constraints. Let A be the deter- 
ministic, simple generalized tree set automaton over {0, 1}" such that C(A) = 
SOL(S'C) (see Proposition 44). We define a partial ordering ■< on <?{o,i}« by: 

VM'e{0,l}™ l ■< V <=> (VU(i) < l'(i)) 

Vff, g' g g {0 ,i}n g<g'^(Vte T(T) g(t) < g'{t)) 

The problem we want to deal with is to décide whether or not there exists a 
least generalized tree set w.r.t.< in C{A). To this aim, we first build a minimal 
solution if it exists, and second, we verify that this solution is unique. 

Let w be a subset of states such that COND(ijj) (see the sketch of proof 
page 160). Let A u = (w,A w ,2 w ) be the generalized tree set automaton A 
restricted to state set lu. 

Now let A w mm defined by: for each (qi, . . . , q p , f) G u p x T v , choose in the 
set A w one rule (gi, . . . , q p , /, l, q) such that l is minimal w.r.t.<. Let AJ nm = 
(u , A u min , 2 W ) . Consequently, 

1. There exists only one run r u on a unique generalized tree set g^ in 
A bJ m%n because for ail q±, . . . , q p G io and / G T p there is only one rule 
(<?!,..., q p , f,l,q) in A u min ; 

2. the run r u on g u is regular; 
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3. the generalized tree set g^ is minimal w.r.t.^. in C(Au). 

Points 1 and 2 are straightforward. The third point follows from the fact 
that A is deterministic. Indeed, let us suppose that there exists a run r' on a 
generalized tree set g' such that g 1 -< g u . Therefore, \/t g'(t) ^ gu(t), and there 
exists (w.l.o.g.) a minimal term u = /(iti, . . . ,U P ) w.r.t.the subterm ordering 
such that g'(u) -< gui(u). Since A is deterministic and \/v < u g^iy) = g'(v), we 
hâve r u (ui) = r'(ui). Hence, the rule (r u (ui), ■ ■ ■ ,r u (u p ),f,g u (u),r u (u)) is not 
such that gu){u) is minimal in A w , which contradicts the hypothesis. 

Consider the generalized tree sets g u for ail subsets of states u> satisfying 
COND(cj). If there is no such g u , then there is no least generalized tree set g 
in C(A). Otherwise, each generalized tree set defines a n-tuple of regular tree 
languages and inclusion is decidable for regular tree languages. Hence we can 
identify a minimal generalized tree set g among ail g u . This GTS g defines a 
n-tuple (F\, . . . , F n ) of regular tree languages. Let us remark this construction 
does not ensure that (.Fi, . . . , F n ) is minimal in C{A). 

There is a deterministic, simple generalized tree set automaton A' such that 
C{A') is the set of characteristic generalized tree sets of ail (Li, . . . , L n ) satis- 
fying F\ Ç Li, . . . , F n Ç L n (see Proposition 38). Let A" be the deterministic 
generalized tree set automaton such that C(A") = C(A) PI £(A') (see Proposi- 
tion 35). There exists a least generalized tree set w.r.t.< in C{A) if and only if 
the generalized tree set automata A and A" are équivalent. Since équivalence 
of generalized tree set automata is decidable (see Proposition 40) we get the 
resuit. □ 

5.5 Bibliographical Notes 

We now survey decidability results for satisfiability of set constraints and some 
complexity issues. 

Décision procédures for solving set constraints arise with [Rey69] , and Mishra 
[Mis84] . The aim of thèse works was to obtain new tools for type inference and 
type checking [AM91, Hei92, HJ90b, JM79, Mis84, Rey69]. 

First consider Systems of set constraints of the form: 

Xi = expi, ...,X n = exp n (5.8) 

where the Xi are distinct variables and the expi are disjunctions of set expres- 
sions of the form f{X^ , . . . , Xi ) with / G T v . Thèse Systems of set constraints 
are essentially tree automata, therefore they hâve a unique solution and each 
Xi is interpreted as a regular tree language. 

Suppose now that the expi are set expressions without complément symbols. 
Such Systems are always satisfiable and hâve a least solution which is regular. 
For example, the System 

Nat = s(Nat) U0 

X = XnNat 

List = cons(X, List) U nil 

has a least solution 

Nat = {s*(0) | i > 0} , X = 9 , List = {nil}. 
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[HJ90a] investigate the class of definite set constraints which are of the form 
exp Ç exp', where no complément symbol occurs and exp' contains no set opéra- 
tion. Definite set constraints hâve a least solution whenever they hâve a solution. 
The algorithm presented in [HJ90a] provides a spécifie set of transformation 
rules and, when there exists a solution, the resuit is a regular présentation of 
the least solution, in other words a System of the form (5.8). 

Solving definite set constraints is EXPTIME-complete [CP97]. Many devel- 
opments or improvements of Heinzte and Jaffar's method hâve been proposed 
and some are based on tree automata [DTT97]. 

The class of positive set constraints is the class of Systems of set constraints 
of the form exp Ç exp' , where no projection symbol occur. In this case, when a 
solution exists, set constraints do not necessarily hâve a least solution. Several 
algorithms for solving Systems in this class were proposed, [AW92] generalize 
the method of [HJ90a], [GTT93, GTT99] give an automata-based algorithm, 
and [BGW93] use the décision procédure for the first order theory of monadic 
predicates. Results on the computational complexity of solving Systems of set 
constraints are presented in a paper of [AKVW93] . The Systems form a natural 
complexity hierarchy depending on the number of éléments of T of each arity. 
The problem of existence of a solution of a System of positive set constraints is 
NEXPTIME-complete. 

The class of positive and négative set constraints is the class of Systems of set 
constraints of the form exp Ç exp 1 or exp % exp' , where no projection symbol 
occur. In this case, when a solution exists, set constraints do not necessarily 
hâve, neither a minimal solution, nor a maximal solution. Let T = {a, &()}. 
Consider the System (b(X) Ç X) A (X <£. _L), this System lias no minimal 
solution. Consider the System (X Ç b(X) U a) A (T % X), this System has 
no maximal solution. The satisfiability problem in this class turned out to 
be much more difficult than the positive case. [AKW95] give a proof based 
on a reachability problem involving Diophantine inequalities. NEXPTIME- 
completeness was proved by [Ste94]. [CP94a] gives a proof based on the ideas 
of [BGW93]. 

The class of positive set constraints with projections is the class of Systems of 
set constraints of the form exp Ç exp' with projection symbols. Set constraints 
of the form f~ (X) Ç Y can easily be solved, but the case of set constraints of 
the form X Ç /~ (Y) is more intricate. The problem was proved decidable by 
[CP94b]. 

The expressive power of thèse classes of set constraints hâve been studied and 
hâve been proved to be différent [Sey94]. In [CK96, Koz93], an axiomatization is 
proposed which enlightens the reader on relationships between many approaches 
on set constraints. 

Furthermore, set constraints hâve been studied in a logical and topological 
point of view [Koz95, MGKW96]. This last paper combine set constraints with 
Tarskian set constraints, a more gênerai framework for which many complexity 
results are proved or recalled. Tarskian set constraints involve variables, relation 
and function symbols interpreted relative to a first order structure. 

Topological characterizations of classes of GTSA recognizable sets, hâve also 
been studied in [Tom94, Sey94]. Every set in 7?-sgts i s a compact set and every 
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set in 7?-gts is the intersection between a compact set and an open set. Thèse 
remarks give also characterizations for the différent classes of set constraints. 
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Chapter 6 

Tree Transducers 



6.1 Introduction 

Finite state transformations of words, also called a-transducers or rational trans- 
ducers in the literature, model many kinds of processes, such as coffee machines 
or lexical translators. But thèse transformations are not powerful enough to 
model syntax directed transformations, and compiler theory is an important 
motivation to the study of finite state transformations of trees. Indeed, trans- 
lation of natural or Computing languages is directed by syntactical trees, and a 
translator from I^Tj^Xinto HTML is a tree transducer. Unfortunately, from a 
theoretical point of view, tree transducers do not inherit nice properties of word 
transducers, and the classification is very intricate. So, in the présent chapter 
we focus on some aspects. In Sections 6.2 and 6.3, toy examples introduce in 
an intuitive way différent kinds of transducers. In Section 6.2, we summarize 
main results in the word case. Indeed, this book is mainly concerned with trees, 
but the word case is useful to understand the tree case and its difficulties. The 
bimorphism characterization is the idéal illustration of the link between the 
"machine" point of view and the "homomorphic" one. In Section 6.3, we moti- 
vate and illustrate bottom-up and top-down tree transducers, using compilation 
as leitmotiv. We precisely define and présent the main classes of tree transduc- 
ers and their properties in Section 6.4, where we observe that gênerai classes 
are not closed under composition, mainly because of alternation of copying and 
nondeterministic processing. Nevertheless most useful classes, as those used in 
Section 6.3, hâve closure properties. In Section 6.5 we présent the homomorphic 
point of view. 

Most of the proofs are tedious and are omitted. This chapter is a very incom- 
plète introduction to tree transducers. Tree transducers are extensively studied 
for themselves and for various applications. But as they are somewhat compli- 
cated objects, we focus hère on the définitions and main gênerai properties. It is 
usefull for every theoretical computer scientist to know main notions about tree 
transducers, because they are the main model of syntax directed manipulations, 
and that the heart of sofware manipulations and interfaces are syntax directed. 
Tree transducers are an essential frame to develop practical modulai* syntax di- 
rected algorithms, thought an effort of algorithmic engineering remains to do. 
Tree transducers theory can be fertilized by other area or can be usefull for 
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other areas (example: Ground tree transducers for decidability of the first order 
theory of ground rewriting). We will be happy if after reading this chapter, 
the reader wants for further lectures, as monograph of Z. Fùlôp and H. Vôgler 
(december 1998 [FV98]). 

6.2 The Word Case 

6.2.1 Introduction to Rational Transducers 

We assume that the reader roughly knows popular notions of language theory: 
homomorphisms on words, finite automata, rational expressions, regular gram- 
mars. See for example the récent survey of A. Mateescu and A. Salomaa [MS96]. 
A rational transducer is a finite word automaton W with output. In a word 
automaton, a transition rule f(q) — > <?'(/) means "if W is in some state g, if it 
reads the input symbol /, then it enters state q' and moves its head one symbol 
to the right" . For defining a rational transducer, it suffices to add an output, 
and a transition rule f(q) — > q' {m) means "if the transducer is in some state 
q, if it reads the input symbol /, then it enters state q' , writes the word m on 
the output tape, and moves its head one symbol to the right" . Remark that 
with thèse notations, we identify a finite automaton with a rational transducer 
which writes what it reads. Note that m is not necessarily a symbol but can 
be a word, including the empty word. Furthermore, we assume that it is not 
necessary to read an input symbol, «.e.we accept transition rules of the form 
e(q) — > q'{m) (s dénotes the empty word). 

Graph présentations of finite automata are popular and convenient. So it is 
for rational transducers. The rule f(q) — ► q'(m) will be drawn 

f/m 



Example 54. (Language L\) Let T = {{, ), ; ,0, 1, A, ..., Z}. In the following, 
we will consider the language L\ defined on T by the regular grammar (the 
axiom is program): 

program — » ( instruct 

instruct — > LOAD register | STORE register | MULT register 
— > | ADD register 

register — > ltailregister 

tailregister — > Otailregister ltailregister ; instruct | ) 
( a — ► b\c is an abbreviation for the set of rules {a — ► 6, a — y c}) 
L\ is recognized by deterministic automaton A\ of Figure 6.1. Semantic of 
L\ is well known: LDAD i loads the content of register i in the accumulator; 
STORE i stores the content of the accumulator in register i; ADD i adds in the 
accumulator the content of the accumulator and the content of register i; MULT 
i multiplies in the accumulator the content of the accumulator and the content 
of register i. 
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0,1 



Figure 6.1: A recognizer of L\ 
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A rational transducer is a tuple R = (Q, T , T' , Qi, Qf, A) where Q is a set 
of states, T and T' are finite nonempty sets of input letters and output letters, 
Qi,Qf Q Q are sets of initial and final states and A is a set of transduction 
rules of the following type: 

f(q)^q'(m), 

where / G T U {e} , m G T'" , q,q' G Q. 

R is e-free if there is no rule f(q) — > q 1 (m) with / = e in A. 

The rnove relation —>r is defined by: let ï,t' G T* , u £ T' , q,q' £ Q, 
f eJ r ,m€J 7 ' ' , 

(tqft',u)-^(tfq't,um) O /(g) -» g' (m) G A, 

and — >Jj is the reflexive and transitive closure of — >r. A (partial) transduction 
of R on tt't" is a séquence of move steps of the form (tqt't", u) -^ R (tt'q't" , ira'). 
A transduction of R from t G .T 7 * into m G T' is a transduction of the form 
(qt, e) — >* R (tq' , u) with q G Qi and g' G Q/. 

The relation Tu induced by R can now be formally defined by: 

Tr = {{t 7 u) | (qt,e) -^(tq',u) with t G T* ,u £ J 7 ' ,q G Qi,q' G Q/}. 

A relation in JF* x T' is a rational transduction if and only if it is induced 
by some rational transducer. We also need the following définitions: let t G T* , 
Tn(t) = {u | (t,u) G Tr}. The translated of a language L is obviously the 
language defined by T R (L) = {u \ 3t G L, u G T fl (£)}. 

Example 55. 

Ex. 55.1 Let us naine French-Li the translation of L\ in French (LQAD is 

translated into CHARGER and STORE into STOCKER). Transducer of Figure 6.2 
realizes this translation. This example illustrâtes the use of rational trans- 
ducers as lexical transducers. 

Ex. 55.2 Let us consider the rational transducer Diff defined by Q = {qi, q s , qi, qa}, 
T = T' = {a,b} 7 Qi = {q,}, Qf = {q s ,qi,q<i}, and A is the set of rules: 

type i a(çfe) -> q l (a), b(qi) -> q l {b) 

type s e(q l ) -> q 3 (a), e{qi) -> q s (b), e(q s ) -> g s (a), e(ç s ) -> q s (b) 

type 1 a(&) -> qi(e), b( qi ) -> g ; (e), a(g z ) -> g z (e), %,) -> ^(e) 

type d afe) -> g d (6), b(qi) -> <7d(a), a(<?d) -> <?d(e), &(%) -> %(e), 
e(%) -> %(a), e(gd) -» ?«*(&). 

It is easy to prove that ÎDiff = {(w, m') | m ^ m', m, m! G {a, &}*}• 

We give without proofs some properties of rational transducers. For more 
détails, see [Sal73] or [MS96] and Exercises 65, 66, 68 for 1, 4 and 5. The 
homomorphic approach presented in the next section can be used as an élégant 
way to prove 2 and 3 (Exercise 70). 

Proposition 46 (Main properties of rational transducers). 
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L/e ^~n □/e^~\ A /«/~\ D/CHARGER 




Figure 6.2: A rational transducer from L\ into French-Li. 
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1. The class of rational transductions is closed under union but not closed 
under intersection. 

2. The class of rational transductions is closed under composition. 

3. Regular languages and context-free languages are closed under rational 
transduction. 

4- Equivalence of rational transductions is undecidable. 

5. Equivalence of deterministic rational transductions is decidable. 

6.2.2 The Homomorphic Approach 

A bimorphism is defined as a triple B = (^jL,^) where L is a recognizable 
language and $ and \P are homomorphisms. The relation induced by B (also 
denoted by B) is defined by B = {($(*),*(*)) \t e L}. Bimorphism ($,£,*) 
is e-free if $ is e-free (an homomorphism is e-free if the image of a letter is 
never reduced to e). Two bimorphisms are équivalent if they induce the same 
relation. 

We can state the following theorem, generally known as Nivat Theorem [Niv68] 
(see Exercises 69 and 70 for a sketch of proof ) . 

Theorem 45 (Bimorphism theorem). Given a rational transducer, an équiv- 
alent bimorphism can be constructed. Conversely, any bimorphism defines a 
rational transduction. Construction préserve e-freeness. 



Example 56. 

Ex. 56.1 The relation {(a(ba) n ,a n ) n E N} U {((ab) n ,b 3n ) n G N} is 

processed by transducer R and bimorphism B of Figure 6.3 





$(A) =a 
$(£) = ba 
$(C) = ab 



V (À) = e 



#(£) = a 
*(C) = bbb 



Figure 6.3: Transducer R and an équivalent bimorphism B = {($(£), \P(t)) | t G 
AB*+CC*}. 



Ex. 56.2 Automaton L of Figure 6.4 and morphisms $ and ty bellow define 
a bimorphism équivalent to transducer of Figure 6.2 
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$(a) = ADD 

*(*) =) 
*(/?) = ( 
$ (a) = ADD 

9(0) =) 



*(A) 

$( P ): 

tt(A) 
9(p) 



LOAD 



CHARGER 



<J>(a) 
$(w) 

9 (a) 



STORE 
1 

STOCKER 
: 1 



$(/i) = MULT 
$(C) = 

*(/z) = MULT 
*(C) = o 




Figure 6.4: The control automaton L. 



Nivat characterization of rational transducers makes intuitive sensé. Au- 
tomaton L can be seen as a control of the actions, morphism \& can be seen as 
output function and $ _1 as an input function. <£> -1 analyses the input — it is 
a kind of part of lexical analyzer — and it générâtes symbolic names; regular 
grammatical structure on thèses symbolic names is controlled by L. Exam- 
ples 56.1 and 56.2 are an obvious illustration. L is the common structure to 
English and French versions, $ générâtes the English version and \& générâtes 
the French one. This idea is the major idea of compilation, but compilation of 
Computing languages or translation of natural languages are directed by syntax, 
that is to say by syntactical trees. This is the motivation of the rest of the chap- 
tcr. But unfortunately, from a formai point of view, we will lose most of the 
best results of the word case. Power of non-linear tree transducers will explain 
in part this complication, but even in the linear case, there is a new phenom- 
ena in trees, the understanding of which can be introduced by the "problem of 
homomorphism inversion" that we describe in Exercise 71. 



6.3 Introduction to Tree Transducers 

Tree transducers and their generalizations model many syntax directed trans- 
formations (see exercises) . We use hère a toy example of compiler to illustrate 
how usual tree transducers can be considered as modules of compilers. 

We consider a simple class of arithmetic expressions (with usual syntax) as 
source language. We assume that this language is analyzed by a LL1 parser. We 
consider two target languages: L\ defined in Example 54 and an other language 
L?. A transducer A translates syntactical trees in abstract trees (Figure 6.5). 
A second tree transducer R illustrâtes how tree transducers can be seen as 
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part of compilers which compute attributes over abstract trees. It décorâtes 
abstract trees with numbers of registers (Figure 6.7). Thus R translates abstract 
trees into attributed abstract trees. After that, tree transducers Xi and Ti 
generate target programs in L\ and L>2, respectively, starting from attributed 
abstract trees (Figures 6.7 and 6.8). This is an example of nonlinear transducer. 
Target programs are yields of generated trees. So composition of transducers 
model succession of compilation passes, and when a class of transducers is closed 
by composition (see section 6.4), we get universal constructions to reduce the 
number of compiler passes and to meta-optimize compilers. 

We now define the source language. Let us consider the terminal alphabet 
{(, ), +, x , a, b, . . . , z}. First, the context-free word grammar G\ is defined by 
rules (E is the axiom): 

E -> M | M + E 

M -> F | F x M 

F -> I\(E) 

I -> a\b\---\z 

Another context-free word grammar Gi is defined by (E is the axiom): 

E -> AIE' 

E' -► +E\e 

M -► FM' 

M' -► xM\e 

F - I\(E) 

I -► a\b\---\z 

Let -E be the axiom of G\ and G^.. The semantic of thèse two grammars 
is obvious. It is easy to prove that they are équivalent, i.e.they define the 
same source language. On the one hand, G\ is more natural, on the other 
hand Gi could be preferred for syntactical analysis reason, because Gi is LL1 
and G\ is not LL. We consider syntactical trees as dérivation trees for the 
tree grammar G^. Let us consider word u = (a + b) x c. u of the source 
language. We define the abstract tree associated with u as the tree x (+(a, 6), c) 
defined over T = {+(,), x(,),a,b,c}. Abstract trees are ground terms over 
T . Evaluate expressions or compute attributes over abstract trees than over 
syntactical trees. The following transformation associâtes with a syntactical 
tree t its corresponding abstract tree A(t). 



/(*)- 


-> x 


F(x) -> x 


M(x,M'(e)) - 


-> X 


E(x,E'(e)) -+ x 


M(x,M'(x,y))- 


■+ x(x,y) 


E(x,E'(+,y))^+(x,y) 


H(,x,))- 


-> x 





We hâve not precisely defined the use of the arrow — >, but it is intuitive. 
Likewise we introduce examples before définitions of différent kinds of tree trans- 
ducers (section 6.4 supplies a formai frame). 

To illustrate nondeterminism, let us introduce two new transducers A and A' . 
Some brackets are optional in the source language, hence A' is nondeterministic. 
Note that A works from frontier to root and A' works fromm root to frontier. 
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A: an Example of Bottom-up Tree Transducer 

The following linear deterministic bottom-up tree transducer A carries out 
transformation of dérivation trees for G2 into the corresponding abstract trees. 
Empty word e is identifiée! as a constant symbol in syntactical trees. States of 
A are q, q s , g/, q F , q M > s , qE'e, Qe, <7x, qwx, q+, Qe'+, ?(, and qy Final state is 
ce- The set of transduction rules is: 



a - 


-» q{a) 


6 -, g(6) 


c - 


-q(c) 


e -> <7e(e) 


)" 


- 9)0) 


( - «((() 


+ - 


- ?+(+) 


X -> 9x(x) 


I(q(x)) - 


-> qi(x) 


F (qi( x )) -> ?f(x) 


M'(q e (x)) - 


-> qM'e{x) 


£'(g e (z)) -► ^(z) 


M{q F {x) 1 q M 'e(y)) ~ 


-> Qm(x) 


E(qM(x),q E 's{y)) -> 9£ (z) 


M'(q x (x),q M (y)) ~ 


■* qM'x(y) 


M(q F (x),q M 'x(y)) -> qAi(x(x,y)) 


E'(q + (x),q E (y))- 


"> ?E'+(î/) 


E(qM(x),q E '+(y)) -> çjs(+(a;,y)) 


(q((x),qE(y),q){z)) - 


-> 9f (y) 





The notion of (successful) run is an intuitive generalization of the notion 
of run for finite tree automata. The reader should note that FTAs can be 
considered as a spécial case of bottom-up tree transducers whose output is equal 
to the input. We give in Figure 6.5 an example of run of A which translates 
dérivation tree t which yields (a + b) x c for context-free grammar Gi into the 
corresponding abstract tree x(+(a, 6),c). 





Figure 6.5: Example of run of A 



A': an Example of Top-down Tree Transducer 

The inverse transformation A , which computes the set of dérivation trees of 
Gi associated with an abstract tree, is computed by a nondeterministic top- 
down tree transducer A'. The states of A' are q F , qF, qu- The initial state is 
qE- The set of transduction rules is: 
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q E (x) -> E(q M (x),E'(e)) q E (+(x,y)) ^ E(q M (x),E'(+,q E (y))) 

q M (x) -> M(q F (x),M'(s)) q M (x(x,y)) -> M(q F {x),M'(x,q M (y))) 
q F (x) - F((, q E (x), )) q F (a) - F(J(o)) 

g F (h) - F(I(&)) g F (c) - ^(/(c)) 

Transducer A' is nondeterministic because there are e-rules like q E (x) — > 
E(qni(x), E'(e)). We give in Figure 6.6 an example of run of A' which transforms 
abstract tree +(a, x(6,c)) into a syntactical tree t' of the word a + b x c. 

qE E 

* ,t' 



+ ^ A ' qu h ' * ^ ^' 



x a + te 

A I 

b c x 

A 

b c 




Figure 6.6: Example of run of A' 



Compilation 

The compiler now transforms abstract trees into programs for some target lan- 
guages. We consider two target languages. The first one is L\ of Example 54. 
To simplify, we omit ";", because they are not necessary — we introduced semi- 
colons in Section 6.2 to avoid e-rules, but this is a technical détail, because word 
(and tree) automata with e-rules are équivalent to usual ones. The second target 
language is an other very simple language L2, namely séquences of two instruc- 
tions +(*j j, k) (put the sum of contents of registers % and j in the register k) and 
x (i, j, k). In a first pass, we attribute to each node of the abstract tree the min- 
imal number of registers necessary to compute the corresponding subexpression 
in the target language. The second pass générâtes target programs. 



First pass: computation of register numbers by a deterministic linear bottom- 
up transducer R. 

States of a tree automaton can be considered as values of (finitely val- 
ued) attributes, but formalism of tree automata does not allow decorating 
nodes of trees with the corresponding values. On the other hand, this déc- 
oration is easy with a transducer. Computation of finitely valued inherited 
(respectively synthesized) attributes is modeled by top-down (respectively 
bottom-up) tree transducers. Hère, we use a bottom-up tree transducer 
R. States of R are qo, . . . , q n . Ail states are final states. The set of rules 
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îs: 

a -» q (a) b -> q (b) 

c -> qo(c) 
+ (q l (x),q l (y)) -» q i+1 (f +1 (x,y)) x(q t (x),q t (y)) -> g,+i(+l x (x,y)) 
if i > j 

+(qi(x),qj(y)) -> q t (f(x,y) x(Qi(x),qj(y)) -> &(*(«, y)) 

if i < j, we permute the order of subtrees 

+(%(»), îj(y)) -> 9j(^(2/: a ')) xteO),%(y)) -> <ij(*(y,x)) 

A run i — >Jj gj(it) means that i registers are necessary to evaluate t. Root 
of t is then relabelled in u by symbol J or f . 

Second pass: génération of target programs in L\ or L2, by top-down de- 
terministic transducers T\ and T%. T\ contains only one state q. Set 
of rules of T\ is: 

q(f(x,y)) -> o(q(x),SlQREi,q(y),ADDi,SJQREi) 
q(i(x,y)) -> o(g(a;),STOREi,g(y),MULTi,STOREi) 

g(a) -> o(L0AD, a) 

g(6) -> o(L0AD,6) 

q(c) -> o(L0AD, c) 

where o(, , , , ) and o(, ) are new symbols. 

State set of T-i is {g, g'} where g' is the initial state. Set of rules of T-i is: 



<l(t(x,y)) -» #(g(2)),g( 2 /),+,(,g'(a;),g'(î/),z,)) ïitfav)) -» * 

g(*0,2/)) -> #(q(x),q(y),x,(,q'{x),q'(y),i,)) q'(*(x,y)) -> t 

g(a) — > e <z'( a ) ~~ * a 

g(6) -» e g'(6) -» 6 

g(c) -» e g'(c) -» c 

where # is a new symbol of arity 8. 

The reader should note that target programs are words formed with leaves 
of trees, î.e.yields of trees. Examples of transductions computed by T\ and 
T2 are given in Figures 6.7 and 6.8. The reader should also note that T\ 
is an homomorphism. Indeed, an homomorphism can be considered as 
a particular case of deterministic transducer, namely a transducer with 
only one state (we can consider it as bottom-up as well as top-down). The 
reader should also note that T2 is deterministic but not linear. 



6.4 Properties of Tree Transducers 

6.4.1 Bottom-up Tree Transducers 

We now give formai définitions. In this section, we consider académie examples, 
without intuitive semantic, to illustrate phenomena and properties. Tree trans- 
ducers are both generalization of word transducers and tree automata. We first 
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Qi(u)- 



q{u) -^ 




where L stands for LOAD, S stands for STORE, A stands for ADD, M stands for MULT. 
The corresponding program is the yield of this tree: 

LOADb ST0RE1 LOADc ADD1 ST0RE1 ST0RE1 LDADd MULT1 STDRE1 ST0RE1 LOADa 
ADD1 ST0RE1 

Figure 6.7: Décoration with synthesized attributes of an abstract tree, and 
translation into a target program of L\. 



q{u)- 




e e + ( b c 1 ) 



e x ( 1 d 1 



The corresponding program is the yield of this tree: +(bcl) x (ldl) + (lai) 
Figure 6.8: Translation of an abstract tree into a target program of Li 
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consider bottom-up tree transducers. A transition rule of a NFTA is of the type 
f(q\(xi), . . . ,q n (x n )) — > q(f{x\, . . . ,x n )). Hère we extend the définition (as we 
did in the word case), accepting to change symbol / into any terni. 

A bottom-up Tree Transducer (NUTT) is a tuple U = (Q, T, F, Qf, A) 
where Q is a set of (unary) states, T and T' are finite nonempty sets of input 
symbols and output symbols, Qf Ç Q is a set of final states and A is a set of 
transduction rules of the following two types: 

f{q 1 (x 1 ),...,q n (x n )) -> q(u) , 

where /6f„,«É T{T' , X n ), q, qi, . . . , q„ G Q , or 

q(xi) -> q'{u) (e-rule), 

where u G T(T\X\), q,q' G Q. 

As for NFTA, there is no initial state, because when a symbol is a leave o 
(i.e.a constant symbol), transduction rules are of the form a — ^ q(u), where 
u is a ground terni. Thèse rules can be considered as "initial rules" . Let 
t, t' G T(T U T 1 U Q). The move relation — >u is defined by: 

^f(qi(xi),---,q n (x„)) -> q(u) G A 

3CgC(JFU^'uQ) 

3wi, ...,«„ G T(J"') 

t = C[f(qi(ui),...,q n (u n ))] 

t' = C[q(u{xi*-v,i,...,x n <- u n })] 

This définition includes the case of e-rule as a particular case. The reflexive 
and transitive closure of — >u is — »^. A transduction of U from a ground term 
t G T{F) to a ground term t' G T(T') is a séquence of move steps of the form 
t — >y q(t'), such that q is a final state. The relation induced by U is the relation 
(also denoted by U) defined by: 

U = {(t, t') | t A g(i'), t G T(J0, t' G T(^'), ? G Qf}- 

The domain of (7 is the set {t G T(F) \ (t,f) G 17}. The image by U of a 
set of ground terms L is the set [/(£) = {f G T(7 r ') 3f G L, {t, t') G U}. 

A transducer is e-free if it contains no e-rule. It is linear if ail tran- 
sition rules are linear (no variable occurs twice in the right-hand side). It 
is non-erasing if, for every rule, at least one symbol of T 1 occurs in the 
right-hand side. It is said to be complète (or non-deleting) if, for every rule 
f(qi(x±), . . . ,q n (x n )) —>■ q( u ) , f° r every Xi{\ < i < n), Xi occurs at least once 
in u. It is deterministic (DUTT) if it is e-free and there is no two rules with 
the same left-hand side. 

Example 57. 

Ex. 57.1 Tree transducer A defined in Section 6.3 is a linear DUTT. Tree 
transducer R in Section 6.3 is a linear and complète DUTT. 

Ex. 57.2 States of U x are q,q'; T = {/(),a}; T' = {<?(,), /(),/'(), a}; q' is 
the final state; the set of transduction rules is: 



a — > q(a) 
f(q(x))^q(f(x))\q(f'(x))\ q '(g(x,x)) 
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U\ is a complète, non linear NUTT. We now give the transductions of 
the ground terni /(/(/(a))). For the sake of simplicity, f f fa stands for 
/(/(/(«)))• We hâve: 

Ui({fffa}) = {g(ffa, ffa), g(ffa, ff'a),g(ffa, f'fa),g(f'fa, f'f'a)}. 

U\ illustrâtes an ability of NUTT, that we describe following Gécseg and 
Steinby. 

Bl- "Nprocess and copy" A NUTT can first process an input sub- 
tree nondeterministically and then make copies of the resulting 
output tree. 

Ex. 57.3 States of Ui are q,q'\ T = T' = {/(),/'(), a}; q is the final state; 
the set of transduction rules is defined by: 

a — > q(a) 

f(q(x)) -> q'(a) 

f'(q'(x)) -> q(a) 

U2 is a non complète DUTT. The tree transformation induced by U2 is 

/ , l t is accepted by the DFTA of final state q and rules 
V> a >\ a _> q ( a ), f(q(x)) -. q'(f(x)),f'(q'(x)) -. q(f(x)) 

B2- "check and delete" A NUTT can first check regular con- 
straints on input subternis and delete thèse subterms afterwards. 



Bottom-up tree transducers translate the input trees from leaves to root, so 
bottom-up tree transducers are also called frontier-to-root transducers. Top- 
down tree transducers work in opposite direction. 

6.4.2 Top-down Tree Transducers 

A top-down Tree Transducer (NDTT) is a tuple D = (Q, T ', J 7 ', Qi, A) where 
Q is a set of (unary) states, T and T' are finite nonempty sets of input sym- 
bols and output symbols, Qi Ç Q is a set of initial states and A is a set of 
transduction rules of the following two types: 

q(f(xi,...,x n )) -> u [qi {x ix ),..., q p (x ip )] , 

where / G T rll u e C P {T'), q, qi, ■ ■ ■ ,q p € Q, , x ix , . . . , x ip G X„, or 

q{x) -> u[qi(x), . . .,q p (x)] (e-rule), 

where u G C P (F'), q, qi, . . . , q p G Q, x G X. 

As for top-down NFTA, there is no final state, because when a symbol is a 
leave a (i.e.a constant symbol), transduction rules are of the form ç(a) — > u, 
where m is a ground term. Thèse rules can be considered as "final rules" . Let 
t, t' G T{T U T 1 U Q). The move relation — >£> is defined by: 



TATA — September 6, 2005 



6.4 Properties of Tree Transducers 183 



3ç(/(xi,...,x„)) -» «[^(xij ),..., q v {x ip j\ G A 

3CeC(fufuQ) 

3ui,.. . ,u n e T{T) 

t = C[q(f(ui,...,u n ))] 

t' = C[u[qi(vi), . . . , q P (v p )])] where Vj = u k if x tj = x k 



D 



This définition includes the case of e-rule as a particular case. — >£, is the 
reflexive and transitive closure of — >r>- A transduction of D from a ground terni 
t G T(!F) to a ground term £' G T(jF') is a séquence of move steps of the form 
q(t) — >* D t' , where q is an initial state. The transformation induced by D is the 
relation (also denoted by D) defined by: 

D = {(*,*') | q(t)^t',te T{T),t' e T(f'),ge Q«}. 

The domain of D is the set {t G T(J r ) | (t,t r ) G D}. The image of a set 
of ground terms L by D is the set D(L) = {f G TÇF') \ 3t G L, (t,t') G D}. 
e-free, linear, non-erasing, complète (or non-deleting), deterministic top-down 
tree transducers are defined as in the bottom-up case. 



Example 58. 

Ex. 58.1 Tree transducers A' , T\, T2 defined in Section 6.3 are examples of 
NDTT. 

Ex. 58.2 Let us now define a non-deterministic and non linear NDTT D\. 
States of D\ are q, q' . The set of input symbols is T = {/(), a}. The set of 
output symbols is T 1 = {g(, ),/(), /'(), a}. The initial state is q. The set 
of transduction rules is: 



q(f( x )) -> 9(q'(x),q'(x)) (copying rule) 

q (f(x)) — * f(q (x)) | / (q (x)) (non deterministic relabeling) 

q'{a) — > a 

D\ transduces /(/(/(a))) (or briefly f f fa) into the set of 16 trees: 

{g{ffa, ffa), g(ffa, ff'a),g(ffa, f'fa), ..., g(f'fa, f'fa),g(ffa, f fa)}. 

D\ illustrâtes a new property. 

D- "copy and Nprocess" A NDTT can first make copies of an 
input subtree and then process différent copies independently and 
nondeterministically . 
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6.4.3 Structural Properties 

In this section, we use tree transducers U\, U2 and D\ of the previous section in 
order to point out différences between top-down and bottom-up tree transducers. 

Theorem 46 (Comparison Theorem). 

1. There is no top-down tree transducer équivalent to U\ or to {7 2 . 

2. There is no bottom-up tree transducer équivalent to D\. 

3. Any linear top-down tree transducer is équivalent to a linear bottom-up tree 
transducer. In the linear complète case, classes of bottom-up and top-down 
tree transducers are equal. 

It is not hard to verify that neither NUTT nor NDTT are closed under 
composition. Therefore, comparison of D-property "copy and Nprocess" and 
[/-property "Nprocess and copy" suggests an important question: 

does alternation of copying and non-determinism induces an infinité 
hierarchy of transformations ? 

The answer is affirmative [Eng78, Eng82], but it was a relatively long-standing 
open problem. The fact that top-down transducers copy before non-deterministic 
processes, and bottom-up transducers copy after non-deterministic processes 
(see Exercise 75) suggests too that we get by composition two intricate infinité 
hiérarchies of transformation. The following theorem summarizes résulta. 

Theorem 47 (Hierarchy theorem). By composition of NUTT, we get an 

infinité hierarchy of transformations. Any composition of n NUTT can be pro- 
cessed by composition ofn+1 NDTT, and conversely (i.e.any composition of n 
NDTT can be processed by composition of n + 1 NUTT). 

Transducer A' of Section 6.3 shows that it can be useful to consider e-rules, 
but usual définitions of tree transducers in literature exclude this case of non 
determinism. This does not matter, because it is easy to check that ail important 
results of closure or non-closure hold simultaneously for gênerai classes and e- 
free classes. Deleting is also a minor phenomenon. Indeed, it gives rise to the 
"check and delete" property, which is spécifie to bottom-up transducers, but 
it does not matter for hierarchy theorem, which remains true if we consider 
complète transducers. 

Section 6.3 suggests that for practical use, non-determinism and non-linearity 
are rare. Therefore, it is important to note than if we assume linearity or deter- 
minism, hierarchy of Theorem 48 collapses. Following results supply algorithms 
to compose or simplify transducers. 

Theorem 48 (Composition Theorem). 

1. The class of linear bottom-up transductions is closed under composition. 

2. The class of deterministic bottom-up transductions is closed under com- 
position. 

3. The class of linear top-down transductions is included in the class of lin- 
ear bottom-up transductions. Thèse classes are équivalent in the complète 
case. 
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4- Any composition of deterministic top-down transductions is équivalent to 
a deterministic complète top-down transduction composed with a linear 
homomorphism. 

The reader should note that bottom-up determinism and top-down deter- 
minism are incomparable (see Exercise 72). 

Recognizable tree languages play a crucial rôle because dérivation trees of 
context-free word grammars are recognizable. Fortunately, we get: 

Theorem 49 (Recognizability Theorem). The domain of a tree transducer 
is a recognizable tree language. The image of a recognizable tree language by a 
linear tree transducer is recognizable. 

6.4.4 Complexity Properties 

We présent now some decidability and complexity results. As for structural 
properties, the situation is more complicated than in the word case, especially 
for top-down tree transducers. Most of problems are untractable in the worst 
case, but empirically "not so much complex" in real cases, though there is a lake 
of "algorithmic engineering" to get performant algorithms. As in the word case, 
emptiness is decidable, and équivalence in undecidable in the gênerai case but is 
decidable in the fc-valued case (a transducer is fc-valued if there is no tree which 
is transduced in more than fc différent terms; so a deterministic transducer is a 
particular case of 1-valued transducer). 

Theorem 50 (Recidability and complexity). Emptiness of tree transduc- 
tions is decidable. Equivalence of k-valued tree transducers is decidable. 

Emptiness for bottom-up transducers is essentially the same as emptiness 
for tree automata and therefore PTIME complète. Emptiness for top-down 
automata, however, is essentially the same as emptiness for alternating topdown 
tree automata, giving DEXPTIME completeness for emptiness. The complexity 
PTIME for testing single- valuedness in the bottom-up case is contained in Seidl 
[Sei92]. Ramsey theory gives combinatorial properties onto which équivalence 
tests for fc-valued tree transducers [Sei94a]. 

Theorem 51 (Equivalence Theorem). Equivalence of deterministic tree 
transducers is decidable. 

6.5 Homomorphisms and Tree Transducers 

Exercise 74 illustrâtes how décomposition of transducers using homomorphisms 
can help to get composition results, but we are far from the nice bimorphism 
theorem of the word case, and in the tree case, there is no illuminating the- 
orem, but many complicated partial statements. Séminal paper of Engelfriet 
[Eng75] contains a lot of décomposition and composition theorems. Hère, we 
only présent the most significant results. 

A delabeling is a linear, complète, and symbol-to-symbol tree homomor- 
phism (see Section 1.4). This very spécial kind of homomorphism changes only 
the label of the input letter and possibly order of subtrees. Définition of tree 
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bimorphisms is not necessary, it is the same as in the word case. We get the fol- 
lowing characterization theorem. We say that a biniorphism is linear, (respec- 
tively complète, etc) if the two morphisms are linear, (respectively complète, 
etc). 

Theorem 52. The class of bottom-up tree transductions is équivalent to the 
class of bimorphisms ($, L, ty) where $ is a delabeling. 

Relation defined by ($, L, \&) is computed by a transduction which is linear 
(respectively complète, e-free) if $ is linear (respectively complète, e-free). 

Remark that Nivat Theorem illuminâtes the symmetry of word transduc- 
tions: the inverse relation of a rational transduction is a rational transduction. 
In the tree case, non-linearity obviously breaks this symmetry, because a tree 
transducer can copy an input tree and process several copies, but it can never 
check equality of subtrees of an input tree. If we want to consider symmetric 
relations, we hâve two main situations. In the non-linear case, it is easy to prove 
that composition of two bimorphisms simulâtes a Turing machine. In the linear 
and the linear complète cases, we get the following résulte. 

Theorem 53 (Tree Bimorphisms). . 

1. The class LCFB of linear complète e-free tree bimorphisms satisfies LCFB C 
LCFB 2 = LCFB 3 . 

2. The class LB of linear tree bimorphisms satisfies LB C LB C /.d C 
LB A = LB 5 . 

Proof of LCFB = LCFB requires many refinements and we omit it. 
To prove LCFB C LCFB" we use twice the same homomorphism ^(a) = 
a, $(f(x)) = f(x), ®(g(x, y)) = g(x, y)),$(h(x, y, z)) = g(x, g(y, z)). 
For any subterms {t\, . . . , ^+2) , let 

t = h(tl,t2, h\H, £4, h(t2i+l,t2i+2, ■ ■ ■ , h(t2p-l, t2p, g(t2p+l,t2p+2) •■■))) 

and 

t = 9(tl: h(t2, £3, h(t4, . . . , h(t2i, t2i+l, h(t2i+2 , t2i+3 , ■ ■ ■ , /l(^2p, ^2p+l, ^2p+2) • ■ ■ )))■ 

We get t' G ($ o $ _1 )(£). Assume that $ o $ _1 can be processed by some 
«f -1 o >D'. Consider for simplicity subterms ti of kind f m (a). Roughly, if lengths 
of ti are différent enough, ^ and i S' must be supposed linear complète. Suppose 
that for some u we hâve ^(u) = t and ty'(u) = t' , then for any context u' 
of u, ^(w') is a context of t with an odd number of variables, and ty'(u') is 
a context of t' with an even number of variables. That is impossible because 
homomorphisms are linear complète. 

Point 2 is a refinement of point 1 (see Exercise 79). 

This example shows a stronger fact: the relation cannot be processed by 
any biniorphism, even non-linear, nor by any bottom-up transducer A direct 
characterization of thèse transformations is given in [AD82] by a spécial class of 
top-down tree transducers, which are not linear but are "globally" linear, and 
which are used to prove LCFB = LCFB . 
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6.6 Exercises 

Exercises 65 to 71 are devoted to the word case, which is out of scoop of this 
book. For this reason, we give précise hints for them. 

Exercise 65. The class of rational transductions is closed under rational opérations. 
Hint: for closure under union, connect a new initial state to initial state with (e, s)- 
rules (parallel composition). For concaténation, connect by the same way final states 
of the first transducer to initial states of the second (sériai composition) . For itération, 
connect final states to initial states (loop opération). 

Exercise 66. The class of rational transductions is not closed under intersection. Hint: 
consider rational transductions {(a n b p ,a n ) | n,p € N} and {(a n b p ,a p ) | n,p £ N}. 

Exercise 67. Equivalence of rational transductions is undecidable. Hint: Associate 
the transduction Tp = {(f(u),g(u)) \ u G E + with each instance P = (f,g) of the Post 
correspondance Problem such that Tp defines {(&(m), \P(m)) | m £ E*}. Consider 
Diff of example 55.2. Diff ^ Diff U Tp if and only if P satisfies Post property. 

Exercise 68. Equivalence of deterministic rational transductions is decidable. Hint: 
design a pumping lemma to reduce the problem to a bounded one by suppression of 
loops (if différence of lengths between two transduced subwords is not bounded, two 
transducers cannot be équivalent). 

Exercise 69. Build a rational transducer équivalent to a bimorphism. Hint: let f{q) — » 
q'(f) a transition rule of L. If $(/) = e, introduce transduction rule e(q) — > g'(\P(/)). 
If <&(/) — ao . . . a„, introduce new states q±, . . . ,q n and transduction rules ao(q) — > 
qi (e), . . . at(qi) -► q i+ i(e), . . . a„(ç„) -► ?'(*(/)). 

Exercise 70. Build a bimorphism équivalent to a rational transducer. Hint: consider 
the set A of transition rules as a new alphabet. We may speak of the first state q and 
the second state q in a letter u f(q) —> q'(m)" . The control language L is the set of 
words over this alphabet, such that (i) the first state of the first letter is initial (ii) the 
second state of the last letter is final (iii) in every two consécutive letters of a word, 
the first state of the second equals the second state of the first. We define «I> and $ by 
*(/(«)- > l'{rn)) = f and 9(J(q)- > q'{m)) = m. 

Exercise 71. Homomorphism inversion and applications. An homomorphism $ is 
non-increasing if for every symbol a, <&(a) is the empty word or a symbol. 

1. For any morphism $, find a bimorphism (&,L,9) équivalent to «l? -1 , with <£>' 
non-increasing, and such that furthermore <&' is e-free if $ is e-free. Hint: $ _1 
is équivalent to a transducer R (Exercise 69) , and the output homomorphism $' 
associated to R as in Exercise 70 is non-increasing. Furthermore, if $ is £-free, 
R and $' are e-free. 

2. Let $ and $ two homomorphism. If $ is non-increasing, build a transducer 
équivalent to $ o <J> _1 (recall that this notation means that we apply Ç before 
$ _1 ). Hint and remark: as $ is non-increasing, <3? _1 satisfies the inverse homo- 
morphism property <^~ 1 {MM') = $ _1 (Af )<Î>~ 1 (M') (for any pair of words or 
languages M and M'). This property can be used to do constructions "symbol 
by symbol". Hère, it suffices that the transducer associâtes $~ 1 (\l'(a)) with a, 
for every symbol a of the domain of $. 

3. Application: prove that classes of regular and context-free languages are closed 
under bimorphisms (we admit that intersection of a regular language with a 
regular or context-free language, is respectively regular or context-free). 
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4. Other application: prove that bimorphisms are closed under composition. Hint: 
remark that for any application / and set E, {(x, }{x)) | f(x) G E} — {(x, f(x)) 

xer\E)}. 

Exercise 72. We identify words with trees over symbols of arity 1 or 0. Let relations 
U = {(/"a,/" a) n G N} U {(f n b,g n b) | n G N} and D = {(ff n a,ff n a) | n G 
N} U {(gf n a,gf n b) | n G N}. Prove that U is a deterministic linear complète bottom- 
up transduction but not a deterministic top-down transduction. Prove that D is a 
deterministic linear complète top-down transduction but not a deterministic bottom- 
up transduction. 

Exercise 73. Prove point 3 of Comparison Theorem. Hint. Use rule-by-rule tech- 
niques as in Exercise 74. 

Exercise 74. Prove Composition Theorem. Hints: Prove 1 and 2 using composition 
"rule-by-rule" , illustrated as following. States of A o B are products of states of A 
and states of B. Let f(q(x))^Aq'{g(x,g(x,a))) and g(qi(x), g(q2 (y), a) — >_b qi (u)- 
Subterms substituted to x and y in the composition must be equal, and determinism 
implies q\ — q 2 . Then we build new rule f((q, ql)(x)) — >AoB(q', qi,)(u)- To prove 3 
for example, associate q(g(x,y)) — > u{q (x) , q" (y)) with g(q'{x),q"(y)) — » q(u), and 
conversely. For 4, Using ad hoc kinds of "rule-by-rule" constructions, prove DDTT C 
DCDTToLHOM and LHOMoDCDTT C DCDTToLHOM (L means linear, C complète, 
D deterministic - and sufRx DTT means top-down tree transducer as usually). 

Exercise 75. Prove NDTT = HOM o NLDTT and NUTT = HOM o NLBTT. Hint: to 
prove NDTT C HOM o NLDTT use a homomorphism H to produce in advance as may 
copies of subtrees of the input tree as the NDTT may need, ant then simulate it by a 
linear NDTT. 

Exercise 76. Use constructions of composition theorem to reduce the number of 
passes in process of Section 6.3. 

Exercise 77. Prove recognizability theorem. Hint: as in exercise 74, "naive" con- 
structions work. 

Exercise 78. Prove Theorem 52. Hint: "naive" constructions work. 

Exercise 79. Prove point 2 of Theorem 53. Hint: E dénote the class of homomor- 
phisms which are linear and symbol-to-symbol. L, LC, LCF dénotes linear, linear 
complète, linear complète e-free homomorphisms, respectively. Prove LCS = L o E = 
E o L and E _1 oL C LoE -1 . Deduce from thèse properties and from point 1 of 
Theorem 53 that LB 4 = E o LCFB 2 o E" 1 . To prove that LB 3 / LB 4 , consider 
^î o vl/J 1 o $ o $ _1 o vp 2 o $7 , where <& is the homomorphism used in point 1 of 
Theorem 53; $1 identity on a, f(x), g{x,y), h(x,y,z), \Pi(e(a:)) = x; ^2 identity on 
a, f(x), g(x,y) and ty 2 (c{x,y,z) = b(b(x, y), z). 

Exercise 80. Sketch of proof of LCFB 2 = LCFB 3 (difficult). Distance D(x,y,u) of 
two nodes x and y in a tree u is the sum of the lengths of two branches which join x 
and y to their younger common ancestor in u. D(x, u) dénotes the distance of x to 
the root of u. 

Let H the class of deterministic top-down transducers T defined as follows: qo, . . . , q n 
are states of the transducer, go is the initial state. For every context, consider the re- 
suit Ui of the run starting from q z {u). 3/c, V context u such that for every variable x 
of m, D(x, u) > k: 

• uq contains at least an occurrence of each variable of u, 
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• for any i, m contains at least a non variable symbol, 

• if two occurrences x and x" of a same variable x occur in Ui, D(x ,x" ,Ui) < k. 

Remark that LCF is included in H and that there is no right hand side of rule with 
two occurrences of the same variable associated with the same state. Prove that 

1. LCF -1 Ç Delabeling- 1 o H 

2. H o Delabeling" 1 Ç Delabeling" 1 oH 
S. H Ç LCFB 2 

4. Conclude. Compare with Exercise 71 

Exercise 81. Prove that the image of a recognizable tree language by a linear tree 
transducer is recognizable. 

6.7 Bibliographie notes 

First of ail, let us précise that several surveys hâve been devoted (at least in 
part) to tree transducers for 25 years. J.W. Thatcher [Tha73], one of the main 
pioneer, did the first one in 1973, and F. Gécseg and M. Steinby the last one 
in 1996 [GS96]. Transducers are formally studied too in the book of F. Gécseg 
and M. Steinby [GS84] and in the survey of J.-C. Raoult [Rao92]. Survey of M. 
Dauchet and S. Tison [DT92] develops links with homomorphisms. 

In section 6.2, some examples are inspired by the old survey of Thatcher, 
because séminal motivation remain, namely modelization of compilers or, more 
generally, of syntax directed transformations as interfacing softwares, which 
are always up to date. Among main precursors, we can distinguish Thatcher 
[Tha73], W.S. Brainerd [Bra69], A. Aho, J.D. Ullman [AU71], M. A. Arbib, E. 
G. Mânes [AM78] . First approaches where very linked to practice of compilation, 
and in some way, présent tree transducers are évolutions of generalized syntax 
directed translations (B.S. Backer [Bak78] for example), which translate trees 
into strings. But crucial rôle of tree structure hâve increased later. 

Many generalizations hâve been introduced, for example generalized finite 
state transformations which generalize both the top-down and the bottom-up 
tree transducers (J. Engelfriet [Eng77]); modular tree transducers (H. Vogler 
[EV91]); synchronized tree automata (K. Salomaa [Sal94]); alternating tree au- 
tomata (G.Slutzki [Slu85]); deterministic top-down tree transducers with iter- 
ated look-ahead (G. Slutzki, S. Vàgvôlgyi [SV95]). Ground tree transducers 
GTT are studied in Chapter 3 of this book. The first and the most natural 
generalization was introduction of top-down tree transducers with look-ahead. 
We hâve seen that "check and delete" property is spécifie to bottom-up tree 
transducers, and that missing of this property in the non-complete top-down 
case induces non closure under composition, even in the linear case (see Com- 
position Theorem). Top-down transducers with regular look-ahead are able to 
recognize before the application of a rule at a node of an input tree whether 
the subtree at a son of this node belongs to a given recognizable tree language. 
This définition remains simple and gives to top-down transducers a property 
équivalent to "check and delete" . 

Contribution of Engelfriet to the theory of tree transducers is important, 
especially for composition, décomposition and hierarchy main results ([Eng75, 
Eng78, Eng82]). 
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We did not many discuss complexity and decidability in this chapter, because 
the situation is classical. Since many problems are undecidable in the word 
case, they are obviously undecidable in the tree case. Equivalence decidability 
holds as in the word case for deterministic or finite-valued tree transducers (Z. 
Zachar [Zac79], Z. Esik [Esi83], H. Seidl [Sei92, Sei94a]). 
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Chapter 7 

Alternating Tree Automata 



7.1 Introduction 

Complémentation of non-deterministic tree (or word) automata requires a deter- 
minization step. This is due to an asymmetry in the définition. Two transition 
rules with the same left hand side can be seen as a single rule with a disjunctive 
right side. A run of the automaton on a given tree has to choose some member 
of the disjunction. Basically, determinization gathers the disjuncts in a single 
state. 

Alternating automata restore some symmetry, allowing both disjunctions 
and conjunctions in the right hand sides. Then complémentation is much easier: 
it is sufficient to exchange the conjunctions and the disjunction signs, as well as 
final and non-final states. In particular, nothing similar to determinization is 
needed. 

This nice formalism is more concise. The counterpart is that décision prob- 
lems are more complex, as we will see in Section 7.5. 

There are other nice features: for instance, if we see a tree automaton as 
a finite set of monadic Horn clauses, then moving from non-deterministic to 
alternating tree automata consists in removing a very simple assumption on the 
clauses. This is explained in Section 7.6. In the same vein, removing another 
simple assumption yields two-way alternating tree automata, a more powerful 
device (yet not more expressive), as described in Section 7.6.3. 

Finally, we also show in Section 7.2.3 that, as far as emptiness is concerned, 
tree automata correspond to alternating word automata on a single-letter al- 
phabet, which shows the relationship between computations (runs) of a word 
alternating automaton and computations of a tree automaton. 



7.2 Définitions and Examples 

7.2.1 Alternating Word Automata 

Let us start first with alternating word automata. 

If Q is a finite set of states, B + (Q) is the set of positive propositional formulas 
over the set of propositional variables Q. For instance, q\ A (92 V93) A (92 V94) G 
B + ({qi, 92, 93,94})- 
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Alternating word automata are defined as deterministic word automata, ex- 
cept that the transition function is a mapping from Q x A to B + (Q) instead of 
being a mapping from Q x A to Q. We assume a subset Qq of Q of initial states 
and a subset Q / of Q of /maZ states. 

Example 59. Assume that the alphabet is {0, 1} and the set of states is 
Uo,qi,q2,q3,q4,q'i,q2}, Qo = {qo}, Qf = {qo,qi, 92,93, 94} and the transitions 
are: 



9o0 - 


-> (90 Agi) 


Vgi 


9o 1 


-> 


90 


giO - 


-> 92 




9i 1 


-> 


true 


q 2 - 


■+ 93 




92 1 


-» 


93 


q 3 - 


-> g 4 




93 1 


-> 


94 


940 - 


-> true 




94 1 


-> 


true 


q[0 - 


- si 




9il 


-> 


92 


q' 2 - 


- 92 




92 1 


^ 


«i 



A rwn of an alternating word automaton A on a word w is a finite tree p 
labeled with Q x N such that: 

• The root of p is labeled by some pair (g, 0). 

• If p(p) = (q, i) and i is strictly smaller than the length of w, w(i + 1) = a, 
ô(q, a) = (f>, then there is a set S = {q\, . . . , q„} of states such that S \= <f>, 
positions pi, . . . ,pn are the successor positions of p in p and p{pj) = 
(qj, i + 1) for every j = 1, ...n. 

The notion of satisfaction used hère is the usual one in propositional calculus: 
the set S is the set of propositions assigned to true, while the propositions not 
belonging to S are assumed to be assigned to false. Therefore, we hâve the 
following: 

• there is no run on w such that w(i + 1) = a for some i, p(p) = (q,i) and 
ô(q, i) = false 

• if ô(q,w(i + 1)) = true and p(p) = (q,i), then p can be a leaf node, in 
which case it is called a success node. 

• Ail leaf nodes are either success nodes as above or labeled with some (q, n) 
such that n is the length of w. 

A run of an alternating automaton is successful on w if and only if 

• ail leaf nodes are either success nodes or labeled with some (q,n), where 
n is the length of w, such that q G Qf. 



the root node p(e) = (qo, 0) with qo G Qo- 



Example 60. Let us corne back to Example 59. We show on Figure 7.1 two 
runs on the word 00101, one of which is successful. 
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9o, 



«o,0 



9o , 1 9i , 1 



9o , 2 (ji , 2 q 2 , 2 



9o,3 



9i,4 



q^2 5 



93,3 



94,4 



9o , 1 9i , 1 



9o , 2 5i,2 <j 2 , 2 



9o,3 



9o , 4 <ji 



93,3 



94,4 



9o,5 



Figure 7.1: Two runs on the word 00101 of the automaton defined in Exam- 
ple 59. The right one is successful. 



Note that non-deterministic automata are the particular case of alternating 
automata in which only disjunctions (no conjunctions) occur in the transition 
relation. In such a case, if there is a succesful run on w, then there is also a 
successful run, which is a string. 

Note also that, in the définition of a run, we can always choose the set S to 
be a minimal satisfaction set: if there is a successful run of the automaton, then 
there is a successful one in which we always choose a minimal set S of states. 

7.2.2 Alternating Tree Automata 

Now, let us switch to alternating tree automata: the définitions are simple 
adaptations of the previous ones. 

Définition 14. An alternating tree automaton over T is a tuple A = (Q, T , /, A) 
where Q is a set of states, I Ç Q is a set of initial states and A is a mapping 
from Qxfto B + (Q x N) such that A(q, f) g B+{Q x {1, . . . , Arity(f)}) where 
Arity(f) is the arity of f. 

Note that this définition corresponds to a top-down automaton, which is 
more convenient in the alternating case. 

Définition 15. Given a terni t G T(T) and an alternating tree automaton A 
on T , a run of A on t is a tree p on Q x N* such that p{e) = (q, e) for some 
state q and 

if pyn) = (q,p), t(p) = f and 6(q, f) = <f>, then there is a subset S = 
{(qi,h), • ■ • , {q n ,i n )} of Q x {1, . . . ,Arity(f)} such that S \= (f>, the 
successor positions ofir in p are {7rl,...,7m} and p(ir-j) = (qj,p-ij) 
for every j = \..n. 

A run p is successful if p(e) = (q,s) for some initial state q G /. 
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Note that (completely specified) non-deterministic top-down tree automata 
are the particular case of alternating tree automata. For a set of non-deterministic 
rules q(f(xi,. . .,£„)) -> f(qi(xi),. .. ,q n {x n )), Delta(q, f) is defined by: 



A(«,/) 



Arity(f) 

V A (* 



(9i 



)es 



;=i 



Example 61. Consider the automaton on the alphabet {/(,), a, b} whose 
transition relation is defined by: 



A 


/ 


a 


b 


qi 


[((gi, 1) A (q 2 , 2)) V ((q u 2) A (g 2) 1))] A (g 4 , 1) 


true 


false 


q\ 


((ga,l)A(g2,2))V((çi,2)A(çi,l)) 


false 


true 


<?4 


(( g3 ,l)A(ç3,2))V((ç4,l)A(g 4 ,2)) 


true 


true 


<?3 


((g 3 , 1) A (ça, 2)) V ((g 4 , 1) A ( gi , 2)) A (g 5 , 1)) 


false 


true 


<?5 


false 


true 


false 



Assume I = {g2}- A run of the automaton on the term t 
is depicted on Figure 7.2. 



f(f(b,f(a,b)),b). 



In the case of a non-deterministic top-down tree automaton, the différent 
notions of a run coincide as, in such a case, the run obtained from Définition 15 
on a tree t is a tree whose set of positions is the set of positions of t, possibly 
changing the ordering of sons. 

Words over an alphabet A can be seen as trees over the set of unary function 
symbols A and an additional constant #. For convenience, we read the words 
from right to left. For instance, aaba is translate into the tree a(b(a(a(#)))) . 
Then an alternating word automaton A can be seen as an alternating tree 
automaton whose initial states are the final states of A, the transitions are the 
same and there is additional rules ô(q°, jf) = true for the initial state q° of A 
and ô(q, #) = false for other states. 

7.2.3 Tree Automata versus Alternating Word Automata 

It is interesting to remark that, guessing the input tree, it is possible to reduce 
the emptiness problem for (non-deterministic, bottom-up) tree automata to 
the emptiness problem for an alternating word automaton on a single letter 
alphabet: assume that A = (Q, F, Qf, A) is a non-deterministic tree automaton, 
then construct the alternating word automaton on a one letter alphabet {a} as 
follows: the states are Q x .7-*, the initial states are Qf x T and the transition 
rules: 



S((q,f),a) 



V A V ((».£)> 



f(qi,...,q n )^qeAi=lfj£F 

Conversely, it is also possible to reduce the emptiness problem for an alter- 
nating word automaton over a one letter alphabet {a} to the emptiness prob- 
lem of non-deterministic tree automata, introducing a new function symbol for 
each conjunction; assume the formulas in disjunctive normal form (this can 
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92,1 



<72,£ 



94,1 çi,2 



9i,ll 92,12 ç 4 ,ll 93,11 93,12 



94,121 9 2 ,121 9i,122 95, 121 94, 121 91, 122 



Figure 7.2: A run of an alternating tree automaton 
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be assumed w.l.o.g, see Exercise 82), then replace each transition ô(q, a) = 
V"=l AjLl(Qi,j' i ) with /»fe,l) ■ ■ ->Qi,ki) ~> ?• 

7.3 Closure Properties 

One nice feature of alternating automata is that it is very easy to perform the 
Boolean opérations (for short, we confuse hère the automaton and the language 
recognized by the automaton). First, we show that we can consider automata 
with only one initial state, without loss of generality. 

Lemma 10. Given an alternating tree automaton A, we can compute in linear 
time an automaton A' with only one initial state and which accepts the same 
language as A. 

Proof. Add one state q° to A, which will become the only initial state, and the 
transitions: 

5(q°j) = \/6( q j) 
«G/ 

□ 

Proposition 47. Union, intersection and complément of alternating tree au- 
tomata can be performed in linear time. 

Proof. We consider w.l.o.g. automata with only one initial state. Given Ai and 
A2, with a disjoint set of states, we compute an automaton A whose states are 
those of Ai and Ai and one additional state q° . Transitions are those of Ai 
and A2 plus the additional transitions for the union: 

Ô(q°J) = ôi(q 1 J)VÔ 2 (qlf) 

where gj, q® are the initial states of Ai and A2 respectively. For the intersection, 
we add instead the transitions: 

Ô(q°J) = 61 (g?,/) A S 2 (q° 2 J) 

Concerning the complément, we simply exchange A and V (resp. true and 
false) in the transitions. The resulting automaton A will be called the dual 
automaton in what follows. 

The proof that thèse constructions are correct for union and intersection are 
left to the reader. Let us only consider hère the complément. 

We prove, by induction on the size of t that, for every state q, t is accepted 
either by A or A in state q and not by both automata. 

If t is a constant a, then 6(q,a) is either true or false. If ô(q, a) = true, 
then ô(q, a) = false and t is accepted by A and not by A. The other case is 
symmetric. 

Assume now that t = f(t\, . . . ,t n ) and ô(q, f) = 4>. Let S be the set of 
pairs (qj,ij) such that £j. is accepted from state qj by A. t is accepted by A, iff 
S \= (p. Let S be the complément of S in Q x [l..n]. By induction hypothesis, 
(qj, i) G S iff ti is accepted in state qj by A. 

We show that S \= (f> iff S ^= (f>. (</> is the dual formula, obtained by ex- 
changing A and V on one hand and true and false on the other hand in (f)). We 
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show this by induction on the size of 4>: if <fi is true (resp. false), then S (= 4> 
and S ^= <p (resp. S = 0) and the resuit is proved. Now, let, <f) be, e.g., (f>\ A 4>2- 
S y= 4> iff either S ^= <f>\ or S ty= <f>2, which, by induction hypothesis, is équivalent 
to S \= 4>\ or S \= 4>2- By construction, this is équivalent to S \= 4>. The case 
4> = 4>i V 4>2 is similar. 

Now t is accepted in state q by A iff S \= <fi iff S ^= iff £ not accepted in 
state ç by A. □ 

7.4 From Alternating to Deterministic Automata 

The expressive power of alternating automata is exactly the same as finite 
(bottom-up) tree automata. 

Theorem 54. // A is an alternating tree automaton, then there is a finite 
deterministic bottom-up tree automaton A' which accepts the same language. 
A' can be computed from A in deterministic exponential time. 

Proof. Assume A = (Q, J",J,A), then A' = {2 Q ,T,Q f ,ô) where Q f = {S G 
2 e ? | S n / ^ 0} and 5 is defined as follows: 

f(S lt ..., S n ) - {q G Q | Sx X {1} U . . . U S n X {n} \= A(q, /)} 

A term t is accepted by A' in state S iff i is accepted by A in ail states q G S*. 
This is proved by induction on the size of t: if t is a constant, then t is accepted in 
ail states q such that A(ç, t) = true. Now, if t = f(ti, . . . , t n ) we let Si, ... , S n 
are the set of states in which t\,...,t n are respectively accepted by A. t is 
accepted by A in a state q iff there is Sq Ç Q x {1, . . . , n} such that <5o |= A(g, /) 
and, for every pair (qi,j) G Sq, tj is accepted in qi. In other words, t is accepted 
by A in state g iff there is an Sq Ç Si x {1}U. . .US n X- {n} such that Sq \= A(g, /), 
which is in turn équivalent to Si x {1} U . . . U S n x {n} \= A(q, /). We conclude 
by an application of the induction hypothesis. □ 

Unfortunately the exponential blow-up is unavoidable, as a conséquence of 
Proposition 47 and Theorems 14 and 11. 



7.5 Décision Problems and Complexity Issues 

Theorem 55. The emptiness problem and the universality problem for alter- 
nating tree automata are DEXPTIME-complete. 

Proof. The DEXPTIME membership is a conséquence of Theorems 11 and 54. 
The DEXPTIME-hardness is a conséquence of Proposition 47 and Theo- 
rem 14. D 



The membership problem (given t and A, is t accepted by A ?) can be 
decided in polynomial time. This is left as an exercise. 
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7.6 Horn Logic, Set Constraints and Alternat- 
ing Automata 

7.6.1 The Clausal Formalism 

Viewing every state q as a unary predicate symbol P q , tree automata can be 
translated into Horn clauses in such a way that the language recognized in state 
q is exactly the interprétation of P q in the least Herbrand model of the set of 
clauses. 

There are several advantages of this point of view: 

• Since the logical setting is déclarative, we don't hâve to distinguish be- 
tween top-down and bottom-up automata. In particular, we hâve a défi- 
nition of bottom-up alternating automata for free. 

• Alternation can be expressed in a simple way, as well as push and pop 
opérations, as described in the next section. 

• There is no need to define a run (which would correspond to a proof in 
the logical setting) 

• Several décision properties can be translated into decidability problems for 
such clauses. Typically, since ail clauses belong to the monadic fragment, 
there are décision procédures e.g. relying on ordered resolution stratégies. 

There are also weaknesses: complexity issues are harder to study in this 
setting. Many constructive proofs, and complexity results hâve been obtained 
with tree automata techniques. 

Tree automata can be translated into Horn clauses. With a tree automaton 
A = (Q, T ', Qf, A) is associated the following set of Horn clauses: 

Pq(f(xi,...,x n )) <- P qi (x 1 ),...,P qn (x n ) 

if/(<Zi, • • ■ i Qn) — > 1 6 A. The language accepted by the automaton is the union 
of interprétations of P q , for q G Q/, in the least Herbrand model of clauses. 

Also, alternating tree automata can be translated into Horn clauses. Alter- 
nation can be expressed by variable sharing in the body of the clause. Con- 
sider an alternating tree automaton (Q,JF, /, A). Assume that the transi- 
tions are in disjunctive normal form (see Exercise 82). With a transition 
A(ç, /) = Vj=i A 7 =ifej ij) ls associated the clauses 

ki 

P q (f( Xl ,...,x n ))^ /\P q] (x l3 ) 

3 = 1 

We can also add e-transitions, by allowing clauses 

P(x) <- Q(x) 

In such a setting, automata with equality constraints between brothers, 
which are studied in Section 4.3, are simply an extension of the above class 
of Horn clauses, in which we allow repeated variables in the head of the clause. 
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Allowing variable répétition in an arbitrary way, we get alternating automata 
with contraints between brothers, a class of automata for which emptiness is 
decidable in deterministic exponential time. (It is expressible in Lôwenheim's 
class with equality, also called sometimes the monadic class). 

Still, for tight complexity bounds, for closure properties (typically by com- 
plémentation) of automata with equality tests between brothers, we refer to 
Section 4.3. Note that it is not easy to dérive the complexity results obtained 
with tree automata techniques in a logical framework. 

7.6.2 The Set Constraints Formalism 

We introduced and studied gênerai set constraints in Chapter 5. Set constraints 
and, more precisely, definite set constraints provide with an alternative descrip- 
tion of tree automata. 

Definite set constraints are conjunctions of inclusions 

e Ç t 

where e is a set expression built using function application, intersection and 
variables and t is a terni set expression, constructed using function application 
and variables only. 

Given an assignment a of variables to subsets of T(JF), we can interpret the 
set expressions as follows: 

[/(ei,...,e„)] CT d = {f(t 1 ,...,t n )\t i e{e i ] a } 
[eiHei d = [ei] CT n[e 2 J CT 
[X] a = Xa 

Then a is a solution of a set constraint if inclusions hold for the corresponding 
interprétation of expressions. 

When we restrict the left members of inclusions to variables, we get an- 
other formalism for alternating tree automata: such set constraints hâve always 
a least solution, which is accepted by an alternating tree automaton. More 
precisely, we can use the following translation from the alternating automaton 
A = (Q,F, 1, A): assume again that the transitions are in disjunctive normal 
form (see Exercise 82) and construct, the inclusion constraints 

f{XlJ. q .d, • ■ ■ , X n J^d) Ç X q 

Pi x q , ç Xj Jt g td 

for every (ç, /) G Q x T and d a disjunct of A(q, /). (An intersection over an 
empty set has to be understood as the set of ail trees). 

Then, the language recognized by the alternating tree automaton is the 
union, for q G /, of X q <r where a is the least solution of the constraint. 

Actually, we are constructing the constraint in exactly the same way as we 
constructed the clauses in the previous section. When there is no alternation, we 
get an alternative définition of non-deterministic automata, which corresponds 
to the algebraic characterization of Chapter 2. 

Conversely, if ail right members of the definite set constraint are variables, 
it is not difficult to construct an alternating tree automaton which accepts the 
least solution of the constraint (see Exercise 85). 
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7.6.3 Two Way Alternating Tree Automata 

Definite set constraints look more expressive than alternating tree automata, 
because inclusions 

XÇf(Y,Z) 

cannot be directly translatée! into automata rules. 

We define hère two-way tree automata which will easily correspond to definite 
set constraints on one hand and allow to simulate, e.g., the behavior of standard 
pushdown word automata. 

It is convenient hère to use the clausal formalism in order to define such 
automata. A clause 

P(u)*-Pl(zi),...,P n (x n ) 

where u is a linear, non- variable term and Xi,...,x n are (not necessarily dis- 
tinct) variables occurring in u, is called a push clause. A clause 

P(x) <- Q(t) 

where x is a variable and t is a linear term, is called a pop clause. A clause 

P(x)^P 1 (x),...,P n (x) 

is called an alternating clause (or an intersection clause). 

Définition 16. An alternating two-way tree automaton is a tuple (Q, Qf, J 7 , C) 

where Q is a finite set of unary function symbols, Qf is a subset of Q and C is a 
finite set of clauses each of which is a push clause, a pop clause or an alternating 
clause. 

Such an automaton accepts a tree t if t belongs to the interprétation of some 
P G Qf in the least Herbrand model of the clauses. 

Example 62. Consider the following alternating two-way automaton on the 
alphabet T = {a, /(, )}: 



1. P 1 (f(f(x 1 ,x 2 ),x 3 )) 

2. P 2 (a) 

3. Pi(f(a,x)) 

4. P 3 (f(x,y)) 

5. P 4 (x) 

6. P 2 {x) 

7. Pi(y) 



P 2 (xi),P 2 (x 2 ),P 2 (x 3 ) 

P2(X) 

Pi(x),P 2 (y) 

P 3 (x),P!(x) 

P4(f(x,y)) 



The clauses 1,2,3,4 are push clauses. Clause 5 is an alternating clause and 
clauses 6,7 are pop clauses. 

If we compute the least Herbrand model, we successively get for the five first 
steps: 



step 


1 


2 


3 


4 


5 


Pi 




f(a,a),f{f(a,a),a) 






a 


P 2 


a 








f(a,a) 


P 3 






/(/(a, a), a), /(/(/(a, a), a), a) 






Pi 








/(/(a, a), a) 
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Thèse automata are often convenient in expressing some problems (see the 
exercises and bibliographie notes) . However they do not increase the expressive 
power of (alternating) tree automata: 

Theorem 56. For every alternating two-way tree automaton, it is possible to 
compute in deterministic exponential time a tree automaton which accepts the 
same language. 

We do not prove the resuit hère (see the bibliographie notes instead). A 
simple way to compute the équivalent tree automaton is as follows: first flat- 
ten the clauses, introducing new predicate symbols. Then saturate the set of 
clauses, using ordered resolution (w.r.i.subterm ordering) and keeping only non- 
subsumed clauses. The saturation process terminâtes in exponential time. The 
desired automaton is obtained by simply keeping only the push clauses of this 
resulting set of clauses. 



Example 63. Let us corne back to Example 62 and show how we get an équiv- 
alent finite tree automaton. 

First flatten the clauses: Clause 1 becomes 

1. Pi(f(x,y)) <- P 5 (x),P 2 (y) 
8. P 5 (f(x,y)) <- P 2 (x),P 2 (y) 

Now we start applying resolution; 

From4 + 5: 9. Pi{f{x,y)) <- P 1 (x),P 2 (y),Pi(f(x,y)) 
Form 9 + 6: 10. P 2 (x) «- P x (x),P 2 {y) 

From 10 + 2: 11. P 2 (x) <- Pi(x) 

Clause 11 subsumes 10, which is deleted. 

From 9 + 7: 12. P 1 (y) «- P 1 (x), P 2 {y) 

From 12+1: 13. P 1 (y) «- P 2 (y), P 5 (x), P 2 {z) 

From 13 + 8: 14. P^y) «- P 2 {y), P 2 { Xl ), P 2 {x 2 ), P 2 (z) 

Clause 14. can be simplified and, by superposition with 2. we get 

From 14 + 2: 15. P x {y) <- P 2 (y) 

At this stage, from 11. and 15. we hâve P\{x) <-> P 2 (x), hence, for simplicity, 
we will only consider P\ , replacing every occurrence of P 2 with P\ . 

From 1+5: 16. P 4 (/(x,y)) «- Pz{f{x,y)),P*>{x),Pi{y) 

From 1 + 9: 17. Pi(f{x,y)) «- P 1 (x) , Pi (y) , P 5 (x) 

From 2 +5: 18. P^a) <- P 3 (a) 

From 3 +5: 19. P 4 (f(a,x)) «- P 3 (f(a,x)), Pi(x) 

From 3+ 9: 20. P 4 (/(a,x)) <- Pi (x), Pi (a) 



From 2 + 20: 21. P 4 (/(a,x)) <- Pi(ic) 
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Clause 21. subsumes both 20 and 19. Thèse two clauses are deleted. 

Ps(f(x,y)),Pi(f(x,y)) 

- W(x,y)),I\(f(x,y)) 

- P^f{x,y)),P h {x\P^y) 

- i^(/(a:,y)),^(a:),Pi(») 

Now every new inference yields a redundant clause and the saturation termi- 
nâtes, yielding the automaton: 



From 5 + 6: 


22. 


Pi(x) 


From 5 +7: 


23. 


Pi(y) 


From 16 + 6: 


24. 


Pi(x) 


From 23+1: 


25. 


Pi(y) 



î. 


Pi(f(x,y)) 


<— 


P 5 (x),P 2 (y) 


2. 


P 1 (a) 






3. 


Pi(f(a,x)) 


<- 


P^x) 


4. 


Ps(f(x,y)) 


<— 


Pi(x),Pi(y) 


8. 


P 5 (f(x,y)) 


<- 


Pi(x),Pi(y) 


11. 


Pi{x) 


<— 


Pi{y) 


15. 


P 2 {x) 


<- 


Pi{x) 


21. 


Pi(f(a,x)) 


^~ 


Pi(x) 



Of course, this automaton can be simplified: Pi and P2 accept ail terms in 

T(T). 

It follows from Theorems 56, 55 and 11 that the emptiness problem (resp. 
universality problems) are DEXPTIME-complete for two-way alternating au- 
tomata. 

7.6.4 Two Way Automata and Definite Set Constraints 

There is a simple réduction of two-way automata to definite set constraints: 

A push clause P(J(x\, . . . ,x n )) <— Pi (xi x ),..., P n (xi n ) corresponds to an 
inclusion constraint 

/(ei,...,e„) Ç X P 

where each ej is the intersection, for ik = j of the variables Xp k . A (conditional) 
pop clause P(xi) <— Q(f(xi, . . . , x n )), Pi(cci), . . . Pk(xk) corresponds to 

f( ei ,...,e n )nX Q Çf(T,...,X P ,T,...) 

where, again, each ej is the intersection, for ik = j of the variables Xp k and T is a 
variable containing ail term expressions. Intersection clauses P(x) <— Q(x), R(x) 
correspond to constraints 

Xq fi Xp Ç Xp 

Conversely, we can translate the definite set constraints into two-way au- 
tomata, with additional restrictions on some states. We cannot do better since 
a definite set constraint could be unsatisfiable. 

Introducing auxiliary variables, we only hâve to consider constraints: 

1. /(^....IJÇI, 

2. i 1 n...ni„çi, 

3. XÇ/(Ii,...,l„). 
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The first constraints are translatée! to push clauses, the second kind of con- 
straints is translated to intersection clauses. Consider the last constraints. It 
can be translated into the pop clauses: 

PXi(Xi) *- P x (f(xi,.. .,£„)) 

with the provision that ail terms in Px are headed with /. 

Then the procédure which solves definite set constraints is essentially the 
same as the one we sketched for the proof of Theorem 56, except that we hâve 
to add unit négative clauses which may yield failure rules 

Example 64. Consider the definite set constraint 

f(X,Y)C)XÇf(Y,X), f(a,Y)ÇX, aÇY, f(f(Y,Y),Y)ÇX 

Starting from this constraint, we get the clauses of Example 62, with the addi- 
tional restriction 

26. -nP 4 (a) 

since every terni accepted in P4 has to be headed with /. 

If we saturate this constraint as in Example 63, we get the same clauses, of 
course, but also négative clauses resulting from the new négative clause: 

From 26 + 18 27. ^P 3 {a) 

And that is ail: the constraint is satisfiable, with a minimal solution described 
by the automaton resulting from the computation of Example 63. 



7.6.5 Two Way Automata and Pushdown Automata 

Two-way automata, though related to pushdown automata, are quite différ- 
ent. In fact, for every pushdown automaton, it is easy to construct a two-way 
automaton which accepts the possible contents of the stack (see Exercise 86). 
However, two-way tree (resp. word) automata hâve the same expressive power 
as standard tree (resp. word) automata: they only accept regular languages, 
while pushdown automata accept context-free languages, which strictly contain 
regular languages. 

Note still that, as a corollary of Theorem 56, the language of possible stack 
contents in a pushdown automaton is regular. 

7.7 An (other) example of application 

Two-way automata naturally arise in the analysis of cryptographie protocols. 
In this context, terms are constructed using the function symbols {_}_ (binary 
encryption symbols), < _, _ > (pairing) and constants (and other symbols which 
are irrelevant hère). The so-called Dolev-Yao model consists in the déduction 
rules of Figure 7.3, which express the capabilities of an intruder. For simplicity, 
we only consider hère symmetric encryption keys, but there are similar rules for 
public key cryptosystems. The rules basically state that an intruder can encrypt 
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Pairing 




Encryption 




< M, v > 


{u} v 




< M, v > 




< u, v > 


dring L 




Unpairing R 





{u} v V 



Decryption 



Figure 7.3: The Dolev-Yao intruder capabilities 



a known message with a known key, can decrypt a known message encrypted 
with k, provided he knows k and can form and décompose pairs. 

It is easy to construct a two-way automaton which, given a regular set of 
ternis R, accepts the set of ternis that can be derived by an intruder using the 
rules of Figure 7.3 (see Exercise 87). 



7.8 Exercises 

Exercise 82. Show that, for every alternating tree automaton, it is possible to com- 
pute in polynomial time an alternating tree automaton which accepts the same lan- 
guage and whose transitions are in disjunctive normal form, i.e.each transition has the 
form 

m k i 

6(q,f) = \/ /\(qj,h) 

4=13=1 

Exercise 83. Show that the membership problem for alternating tree automata can 
be decided in polynomial time. 

Exercise 84. An alternating automaton is weak if there is an ordering on the set of 
states such that, for every state q and every fonction symbol /, every state q' occurring 
in S(q, /) satisfies q' < q. 

Prove that the emptiness of weak alternating tree automata is in PTIME. 

Exercise 85. Given a definite set constraint whose ail right hand sides are variables, 
show how to construct (in polynomial time) k alternating tree automata which accept 
respectively Xia, . . . ,Xu<J where er is the least solution of the constraint. 

Exercise 86. A pushdown automaton on words is a tuple (Q, Qj, A, T, 5) where Q is 
a finite set of states, Qf Ç Q, A is a finite alphabet of input symbols, T is a finite 
alphabet of stack symbols and S is a transition relation defined by rules: qa — » q' 



and qa 



q' where q,q € Q, a G A and w,w' G T* . 



A configuration is a pair of a state and a word 7 £ T* . The automaton may move 

when reading o, from (9,7) to (9 ,7 ) if either there is a transition qa — > q' and 

— 1 
7' = w ■ 7 or there is a transition qa > q' and 7 = w ■ 7'. 
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1. Show how to compute (in polynomial time) a two-way automaton which accepts 
w in state q iff the configuration (q, w) is reachable. 

2. This can be slightly generalized considering alternating pushdown automata: 

w w' 1 

now assume that the transitions are of the form: qa — > <f> and qa ► (f> 

where <j> £ B + (Q). Give a définition of a run and of an accepted word, which is 
consistent with both the définition of a pushdown automaton and the définition 
of an aiternating automaton. 

3. Generaiize the resuit of the first question to alternating pushdown automata. 

4. Generaiize previous questions to tree automata. 

Exercise 87. Given a finite tree automaton A over the alphabet {a, {_}_, < _,_>}, 
construct a two-way tree automaton which accepts the set of terms t which can be 
deduced by the rule of Figure 7.3 and the rule 

If t is accepted by A 

t 



7.9 Bibliographie Notes 

Alternation lias been considered for a long time as a computation model, e.g. for 
Turing machines. The séminal work in this area is [CKS81], in which the rela- 
tionship between complexity classes defined using (non)-deterministic machines 
and alternating machines is studied. 

Concerning tree automata, alternation has been mainly considered in the 
case of infinité trees. This is especially useful to keep small représentations 
of automata associated with temporal logic formulas, yielding optimal model- 
checking algorithms [KVWOO]. 

Two-way automata and their relationship with clauses hâve been first consid- 
ered in [FSVY91] for the analysis of logic programs. They also occur naturally 
in the context of definite set constraints, as we hâve seen (the completion mecha- 
nisms are presented in, e.g., [HJ90a, CP97]), and in the analysis of cryptographie 
protocols [GouOO]. 

There several other définitions of two-way tree automata. We can distinguish 
between two-way automata which hâve the same expressive power as regular 
languages and what we refer hère to pushdown automata, whose expressive 
power is beyond regularity. 

Décision procédures based on ordered resolution stratégies could be found 
in [Jr.76]. 

Alternating automata with contraints between brothers define a class of 
languages expressible in Lôwenheim's class with equality, also called sometimes 
the monadic class. See for instance [BGG97]. 
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